hot links

Stuff The Internet Says On Scalability For May 22nd, 2015

High Scalability

22 May 2015 — 10 min read

Hey, it's HighScalability time:

Where is the World Brain? San Fernando marshes in Spain (by Cristobal Serrano)

569TB: 500px total data transfer per month; 82% faster: elite athletes' brains; billions and millions: Facebook's graph store read and write load; 1.3 billion: daily Pinterest spam fighting events; 1 trillion: increase in processing power performance over six decades; 5 trillion: Facebook pub-sub messages per day
Quotable Quotes:
- Silicon Valley: “Tell me the truth,” Gavin demands of a staff member. “Is it Windows Vista bad? Zune bad?” “I’m sorry,” the staffer tells Gavin, “but it’s Apple Maps bad!”
- @garybernhardt: Reminder to people whose "big data" is under a terabyte: servers with 1 TB RAM can be had about $20k. Your data set fits in RAM.
- @epc: μServices and AWS Lambda are this year’s containers and Docker at #Gluecon
- orasis: So by this theory the value of a tech startup is the developer's laptops and the value of a yoga studio is the loaner mats.
- @ajclayton: An average attacker sits on your network for 229 days, collecting information. @StephenCoty #gluecon
- @mipsytipsy: people don't *cause* problems, they trigger latent conditions that make failures more likely. @allspaw on post mortems #srecon15europe
- @pas256: The future of cloud infrastructure is a secure, elastically scalable, highly reliable, and continuously deployed microservices architecture
- Kevin Marks: The Web is the network
- @cdixon: We asked for flying cars and all we got was the entire planet communicating instantly via $34 pocket supercomputers
- @ajclayton: Uh oh, @pas256 just suggested that something could be called a "nanoservice"...microservices are already old. #gluecon
- @jamesurquhart: A sign that containers are interim step? Pkging procs better than pkging servers, but not as good as pkging functs?
- @markburgess_osl: Let's rename "immutable infrastructure" to "prefab/disposable" infrastructure, to decouple it from the false association with functionalprog
- @Beaker: Key to startup success: solve a problem that has been solved before but was constrained due to platform tech cost or non-automated ops scale
- @mooreds: 10M req/month == $45 for lambda. Cheap. -- @pas256 #gluecon
- @ajclayton: Microservices "exist on all points of the hype cycle simultaneously" @johnsheehan #gluecon
- @oztalip: "Treat web server as a library not as a container, start it inside your application, not the other way around!" -@starbuxman #GOTOChgo
- @sharonclin: If a site doesn't load in 3 sec, 57% abandon, 80% never return. @krbenedict #m6xchange #Telerik
- QuirksMode: Tools don’t solve problems any more, they have become the problem.
- @rzazueta: Was considering taking a shot every time I saw "Microservices" on the #gluecon hashtag. But I've already gone through two livers.
- @MariaSallis: "If you don't invest in infrastructure, don't invest in microservices" @johnsheehan #gluecon
- Brian Gallagher: If the world devolved into a single cloud provider, there would be no need for Cloud Foundry.
- @b6n: startup idea: use technology from the 70s.
- Steven Hawking: The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge
- @aneel: "Monolithic apps have unlimited invisible internal dependencies" -@adrianco #gluecon
- @windley: microservices don’t reduce complexity, they move it around, from dev to ops. #gluecon
- @paulsbruce: When everyone has to be an expert in everything, that doesn't scale." @dberkholz @451research #gluecon
- @oamike: I didn’t do SOA right, I didn’t do REST right, I’m sure as hell not going to do micro services right. #gluecon @kinlane
- Urs Hölzle: My biggest worry is that regulation will threaten the pace of innovation.
- @mccrory: There has been an explosion in managed OpenStack solutions - Platform9, MetaCloud, BlueBox
- @viktorklang: Remember that you heard it here first, CPU L1 cache is the new disk.

This is more a measure of the fecundity of the ecosystem than an indication of disease. By its very nature the magic creation machine that it is Silicon Valley must create both wonder and bewilderment. Silicon Valley Is a Big Fat Lie: That gap between the Silicon Valley that enriches the world and the Silicon Valley that wastes itself on the trivial is widening daily.

In a liquidity crisis all those promises mean nothing. RadioShack Sold Your Data to Pay Off Its Debts.

YouTube has to work at it too. To Take On HBO And Netflix, YouTube Had To Rewire Itself: All of the things that InnerTube has enabled—faster iteration, improved user testing, mobile user analytics, smarter recommendations, and more robust search—have paid off in a big way. As of early 2015, YouTube was finally becoming a destination: On mobile, 80% of YouTube sessions currently originate from within YouTube itself.

If you aren't doing web stuff, do you really need to use HTTP? Do you really know why you prefer REST over RPC? There's no reason for API requests to pass through an HTTP stack.

If scaling is specialization and the cloud is the computer then why are we still using TCP/IP between services within a datacenter? Remote Direct Memory Access is fast. FaRM: Fast Remote Memory: FaRM’s per-machine throughput of 6.3 million operations per second is 10x that reported for Tao. FaRM’s average latency at peak throughput was 41µs which is 40–50x lower than reported Tao latencies.

MigratoryData with 10 Million Concurrent Connections on a single commodity server. Lots of details on how the benchmark was run and the various configuration options. CPU usage under 50% (with spikes), memory usage was predictable, network traffic was 0.8 Gbps for 168,000 messages per second, 95th Percentile Latency: 374.90 ms. Next up? C100M.

Does anyone have a ProductHunt invite that they would be willing share with me?

Doesn't seem to matter where you start, this is the strange attractor. What’s Going On At Medium?: the platform’s main goal would be to prioritize and increase engagement across the publishing network. Specifically, to encourage more readers to create accounts and sign in to Medium.

Wow, just imagine being able to run an entire OS in as little as 32MB of RAM! Google Is Readying Its Own OS For Running The Internet Of Things: codenamed "Brillo" for now, though it emerge under the Android brand...It will be able to run on as little as 64MB or 32MB of RAM.

Spot Fleet API is a good name for Amazon's new API for managing spot instances, but wouldn't Zombie Master be cooler?

James Urquhart on Solution Gravity: I believe now, more than ever — and perhaps this is the first time I’ve articulated this — that disruptive technology solutions do one thing: they evolve to directly meet the needs of the ultimate driver of technology demand. Slowly but surely, the value chain required to meet the needs of the top of that chain will be simplified, removed “from view” and turned into services that are dead simple to consume. There is a sort of “gravity” that pulls the most valuable activities from the basic building blocks that make solutions possible to the solutions themselves.

Here's the story of Scaling Meteor to 20,000+ Users in 7 Days. Tips: Ensure all publish function queries are supported by an Index; Ensure you are using Oplog for all publish function queries; Use fields, limits, and sorts in publish function queries; Increase the memory for MongoDB; Use Kadira and Compose to uncover bottlenecks.

A good idea. Google Moves Its Corporate Applications to the Internet: Google taking a new approach to enterprise security, is moving its corporate applications to the Internet. In doing so, the Internet giant is flipping common corporate security practice on its head, shifting away from the idea of a trusted internal corporate network secured by perimeter devices such as firewalls, in favor of a model where corporate data can be accessed from anywhere with the right device and user credentials.

Nanowire networks can hardwire adaptability: "We have a saying: if it does what you expect it to do it’s impressive but not interesting - algorithmic systems generally do what they’re designed to do," says Stieg. "There’s no software in your brain but you can learn and adapt readily - it makes it a bit messier but it gives the capacity to perform a range of tasks."

Why should I have written ZeroMQ in C, not C++ (part I): I believe that requirement for fully-defined behaviour breaks the object-oriented programming model. The reasoning is not specific to C++. It applies to any object-oriented language with constructors and destructors. Consequently is seems that object-oriented languages are better suited for the environments where the need for rapid development beats the requirement for no undefined behaviour.

Data data everywhere. Google Backs Farm-Focused Startup as 'AgTech' Blooms.

It certainly is difficult to find this many developers under 30 years old. @devevangelist: The "Developer Drought" - 570k CS jobs in the US, 136k new CS jobs/year in US, and only 40k CS graduates per year in US. @adamse #gluecon

Videos from the Cloud Foundry Summit 2015 are available.

Here Is How Amazon Preempted Google's Assault On EC2 Spot Instances: Google’s Preemptible VMs are exactly like EC2 Spot Instances but with a fixed price. Unlike AWS, where customers need to continuously monitor the Spot Market for price fluctuations, GCE Preemptible VMs come with a fixed price tag that’s approximately 1/4 of the typical VM price.

To handle big data, shrink it: MIT researchers will present a new algorithm that finds the smallest possible approximation of the original matrix that guarantees reliable computations. For a class of problems important in engineering and machine learning, this is a significant improvement over previous techniques.

More videos from the @Scale conference are available.

Some good definitions and advice. Message Driven Systems: Some Rules of Engagement: Don’t implement complex business logic in event handlers; Don’t use heavy messages with lots of properties; Don’t write generic command handlers with complex business logic.

In reality we want both schema and non-schema views of data. How I Learned to Stop Worrying and Love the Schema, part 1.

A promise delivered. Top 10 data mining algorithms in plain English.

Yes indeed. bane: I think we're starting to see the pendulum swing back towards caring about performance. For years we could just throw more hardware at a problem, and now performance curves are starting to flatten out. The bandaid has been to use more systems, but old-timers are starting to remember back to their youth, when systems were slow, that there used to be a phase in software development called "performance optimization" that's almost entirely disappeared in the modern discipline.

A good look at alternatives: Web Crawling & Analytics Case Study - Database Vs Self Hosted Message Queuing Vs Cloud Message Queuing.

A not so Dilbertish approach. The “Systems” Business Model: If you ask me how we plan to make money, I’ll tell you I have no idea. But if any one of our products is a runaway hit, we will organize around it. If people like them as a package, we can work with that too. Maybe someday we will add some premium features. Maybe someday the app upgrade will cost one dollar. Maybe someday there will be an advertising component. We can go in any of those directions. We designed those doors to stay open.

Videos are available for Microsoft's Build 2015 and Ignite 2015 conferences.

Getting the First Row per Group 5X Faster. Typically getting the next item from a database table using a query is considered too slow. Using Postgres and Distinct On and Ordered Indexes they were able to get the next item in 43 milliseconds. Not bad. Also, Postgres Job Queues & Failure By MVCC.

A Google systems guru explains why containers are the future of computing: I think the way you build applications is going to really change. The reason I say that is once you can get to a view of an application that’s composed of micro-services, then people that are building software will start building micro-services rather than building libraries.

You don't always need a database server. Your program can be the server. Facebook with great detail on how they use RocksDB, an embedded database, in osquery, an operating system instrumentation framework for OS X and Linux. How RocksDB is used in osquery. Also, Spam Fighting @Scale Recap.

Pachyderm v0.7: Replication, Automatic Failover, and Tests: we chose to use CoreOS’s distributed systems primitives to build out basic replication and automatic failover in less than a month of development. Deployment on Kubernetes or Mesos will eventually be available too.

Google Translate, where are you when I need you? Latency Tricks: One final cool fact I want to leave you with is that if you have a server with, it has infinite average latency. You can give the service finite average latency by replicating it enough times to make . This means it's possible to build a service with finite average response time out of servers with infinite average response time!

Curt Monash with an excellent overview of MemSQL 4.0.

On Scalability, Capacity, and Sensitivity: The key insight is that in the question of scaling vertically or horizontally, you need to understand what capacity limits your system can cope with, realistically. And in practice, it may be necessary to avoid loading systems to much vertically in order to achieve correct operation when you strike gold when slaving away in the venture capitalist mine.

Roll Your Own API Management Platform with nginx and Lua. Here's how Comcast manages their API by putting a layer in front of the API using nginx and Lua. Kind of an Aspect Oriented Programming approach. The layer handles concurrent rate limiting, capacity management, and authentication. Vagrant, Ansible, Python, Ansible are used for automated test and deployment.

Lambda shell: Run sh commands inside AWS Lambda environment.

EGADS (Extendible Generic Anomaly Detection System): is an open-source Java package to automatically detect anomalies in large scale time-series data.

VIPS: a free image processing system. It includes a range of filters, arithmetic operations, colour processing, histograms, and geometric transforms.

Graph Engine: a distributed, in-memory, large graph processing engine, underpinned by a strongly-typed RAM store and a general computation engine.

Atomic switch networks—nanoarchitectonic design of a complex system for natural computing: This work explores methods of fabricating a self-organized complex device known as an atomic switch network and discusses its potential utility in computing. Through a merger of top-down and bottom-up techniques guided by mathematical and nanoarchitectonic design principles, we have produced functional devices comprising nanoscale elements whose intrinsic nonlinear dynamics and memorization capabilities produce robust patterns of distributed activity and a capacity for nonlinear transformation of input signals when configured in the appropriate network architecture.

Stuff The Internet Says On Scalability For May 22nd, 2015

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale