Stuff The Internet Says On Scalability For November 6th, 2015

Hey, it's HighScalability time:


Cool geneology of Relational Database Management Systems.

  • 9,000: Artifacts Uncovered in California Desert; 400 Million: LinkedIn members; 100: CEOs have more retirement assets than 41% of American families; $160B: worth of AWS; 12,000: potential age of oldest oral history; fungi: world's largest miners 

  • Quotable Quotes:
    • @jaykreps: Someone tell @TheEconomist that people claiming you can build Facebook on top of a p2p blockchain are totally high.
    • Larry Page: I think my job is to create a scale that we haven't quite seen from other companies. How we invest all that capital, and so on.
    • Tiquor: I like how one of the oldest concepts in programming, the ifdef, has now become (if you read the press) a "revolutionary idea" created by Facebook and apparently the core of a company's business. I'm only being a little sarcastic.
    • @DrQz: +1 Data comes from the Devil, only models come from God. 
    • @DakarMoto: Great talk by @adrianco today quote of the day "i'm getting bored with #microservices, and I’m getting very interested in #teraservices.”
    • @adrianco: Early #teraservices enablers - Diablo Memory1 DIMMs, 2TB AWS X1 instances, in-memory databases and analytics...
    • @PatrickMcFadin: Average DRAM Contract Price Sank Nearly 10% in Oct Due to Ongoing Supply Glut. How long before 1T memory is min?
    • @leftside: "Netflix is a monitoring system that sometimes shows people movies." --@adrianco #RICON15
    • Linus: So I really see no reason for this kind of complete idiotic crap.
    • Jeremy Hsu: In theory, the new architecture could pack about 25 million physical qubits within an array that’s 150 micrometers by 150 µm. 
    • @alexkishinevsky: Just done AWS API Gateway HTTPS API, AWS Lambda function to process data straight into AWS Kinesis. So cool, so different than ever before.
    • @highway_62: @GreatDismal Food physics and candy scaling is a real thing. Expectations and ratios get blown. Mouth feel changes.
    • @randybias:  #5 you can’t get automation scaling without relative homogeneity (homologous) and that’s why the webscale people succeeded
    • Brian Biles: Behind it all: VMs won.  The only thing that kept this [Server Centric Storage is Killing Arrays] from happening a long time ago was OS proliferation on physical servers in the “Open Systems” years.  Simplifying storage for old OS’s required consolidative arrays with arbitrated-standard protocols.
    • @paulcbetts: This disk is writing at almost 1GB/sec and reading at ~2.2GB/sec. I remember in 2005 when I thought my HD reading at 60MB/sec was hot shit.
    • @merv: One of computing’s biggest challenges for architects and designers: scaling is not distributed uniformly in time or space.

  • To Zookeeper or to not Zookeeper? This is one of the questions debated on an energetic mechanical-sympathy thread. Some say Zookeeper is an unreliable and difficult to manage. Others say Zookeeper works great if carefully tended. If you need a gossip/discovery service there are alternatives: JGroups, Raft, Consul, Copycat.

  • Algorithms are as capable of tyranny as any other entity wielding power. Twins denied driver’s permit because DMV can’t tell them apart

  • Odd thought. What if Twitter took stock or options as payment for apps that want to use Twitter as a platform (not Fabric)? The current user caps would effectively be the free tier. If you want to go above that you can pay. Or you can exchange stock or options for service. This prevents the Yahoo problem of being King Makers, that is when Google becomes more valuable than you. It gives Twitter potential for growth. It aligns incentives because Twitter will be invested in the success of apps that use it. And it gives apps skin in the game. Although Twitter has to recognize the value of the stock they receive as revenue, they can offset that against previous losses.

  • One of the best stories ever told. Her Code Got Humans on the Moon—And Invented Software Itself: MARGARET HAMILTON WASN’T supposed to invent the modern concept of software and land men on the moon...But the Apollo space program came along. And Hamilton stayed in the lab to lead an epic feat of engineering that would help change the future of what was humanly—and digitally—possible. 

  • Without infrastructure there are no stepping stones to greatness. James Fallows: All societies under-invest in their infrastructure—in the systems that allow them to thrive. There is hardware infrastructure: clean water, paved roads, sewer systems, airports, broadband; and, Fallows suggested, software infrastructure: organizational and cultural practices such as education, safe driving, good accounting, a widening circle of trust. China, for example, is having an orgy of hard infrastructure construction. It recently built a hundred airports while America built zero. But it is lagging in soft infrastructure such as safe driving and political transition. 

  • Imgur gets a new stack: React was decided on for the core of the application logic (both view logic and client-side manipulation of data) while the backing server would be Node...this approach lets us have a 99% shared codebase...Another benefit of a full Javascript stack is that the server can declare dependencies the same way the browser can...We’ve also fully embraced ES2015.

  • Stephen Wolfram with a great story on George Boole: A 200-Year View: George Boole’s great achievement was to show how to bring them [logic and math] together...At age 19, George Boole did a startup: he started his own elementary school...150 years before Boole, Gottfried Leibniz had also thought about using algebra to represent logic...what has made Boole’s name so widely known is not Boolean algebra, it’s the much simpler notion of Boolean variables.

  • Sharks are so desperate for Internet service they eat undersea fiber optic cables. Just an amazing video. Totally alien feeling. The cable could be some sort of exotic deep sea animal, instead it carries the heart of humanity.

  • If you need a free official SSL cert then https://letsencrypt.org is your free, automated, and open Certificate Authority of choice. This is all part of the campaign to make the Internet secure. To do that you need everyone to have a cert, but they are expensive, so Let's Encrypt automates the process so certs can be accessible to all. And yes, Let's Encrypt is trusted by all major browsers.

  • Scaling Basecamp and making it insanely fast with CTO and Rails Founder, David H. Hansson: Here are some of the stats: 5 million users (keep in mind they are private and they choose which numbers to release and how) 2,000 requests per second at peak 30 app servers (For the current Basecamp version, not Basecamp 3) They store 700TB of user uploads on a distributed cluster of Cleversafe installations They have ~1.5TB in a single MySQL instance 375GB+ in a single MySQL table 48GB of RAM across 6 instances (288GB) for caching.

  • We have a new meme...Container Native: the real value of containers – fast immutable deployments, maximizing resource utilization, and bare-metal performance – comes from an architecture optimized for containers. This is container-native architecture.

  • In the who would have thunk it department Apple has become the world's premier chip manufacturer. While part of the market is shifting to white box everything instead of custom solutions built around ASICs, there are another areas where custom functionality and vertical integration is a competitive advantage. And Google wants in. With Apple in Mind, Google Seeks Android Chip Partners.

  • The captivating inside story of how Yahoo delivered the First Free Global Live Stream of an NFL Game on Yahoo: On the technical side, the HD video signal was shipped from London to our encoders in Dallas and Sunnyvale, where it was converted into Internet video. The streams were transcoded (compression that enables efficient network transmission) into 9 bitrates ranging from 6Mbps to 300kbps. We also provided a framerate of 60 frames per second (fps), in addition to 30fps, thus allowing for smooth video playback suited for a sport like NFL football. Having a max bitrate of 6Mbps with 60fps gave a “wow” factor to the viewing experience, and was a first for NFL and sports audiences.

  • When your boss is an algorithm. Uber's Drivers: Information Asymmetries and Control in Dynamic Work: In Uber’s system, algorithms, CSRs, passengers, semiautomated performance evaluations, and the rating system all act as a combined substitute for direct managerial control over drivers, but distributed responsibility for remote worker management also exacerbates power asymmetries between Uber and its drivers.

  • Isn't this is what everyone wants? Facebook, Amazon, Google. A market where they control both sides and they can do whatever they want in-between. Peeking Beneath the Hood of Uber: We argue that Uber’s reliance on discrete surge areas introduces unfairness into their system: two users standing a few meters apart may unknowingly receive dramatically different surge multipliers. 

  • A comprehensive look at Scaling Docker with Kubernetes V1. Nice definition of terms and a tour through the command line magic needed to make it so. Doesn't look too hard.

  • Good advice on Handling trillions of events daily and conquering scaling issues with Keen CTO. They run on SoftLayer, Nginx, Flask, Tornado, Kafka, Storm, Cassandra, Python, D3, DataDog, PagerDuty.

  • We are always at war with dynamic websites. Why Static Website Generators Are the Next Big Thing: "the static version is more than six times as fast on average...cache invalidation is extremely hard to get right...more static website generators are released every week...the modern browser is an operating system in its own right...many features that used to require dynamic code running on a server can be moved entirely to the client...the CDN is going mainstream." Good discussion on Hacker News

  • Ecosystems that don't enable hosted organisms to survive, will die. Mobile App Developers are Suffering. It's more winner takes all: the top 20 app publishers, representing less than %0.005 of all apps, earn 60% of all app store revenue.

  • Episode 057 — The User Experience. Interesting parallel between A/B testing and website metrics versus corporate metrics. You can't A/B test yourself to a great product and you can't look at company metrics as way of making great products. Both are a way of making something more efficient, but there's no way to jump to something new or different.

  • Day 2 PuppetConf 2015: Now Walmart’s configuration management team is looking to expand its automation program to the company's full fleet of Windows servers — taking the number to 80,000 nodes under Puppet management — and ultimately to all 100,000 servers in its inventory (half are in the company’s 11,500-plus stores). And all that automation work is being done by seven people.

  • Gaia is not impressed. IBM's brain-like chip and the quest for a 'cognitive planet: a key advantage of the chip is its energy efficiency: it can perform 46 billion synaptic operations per second while consuming only 70 milliwatts.

  • It's faster to Drill than Spark. Comparing SQL Functions and Performance with Apache Spark and Apache Drill.

  • Cringely on Amazon’s cloud monopoly: But there is an important question here and that’s at what point Amazon will be in a position to use lethal cloud force? It’s a market doubling or more in size every year. How many more doubles will it take for Amazon to gain such lethal business power? I’d say five more years will do it.

  • Here's A first look at RDS Aurora. Pros: Higher write capacity, writer is unaffected by the other nodes; Simpler logic, no need for certification; Scale iops tremendously; Fast failover; No need for quorum (handled by the object store); Simple to deploy. Cons: Likely asynchronous at the storage level; Only one node is writable; Not open source.

  • It's like finding a coupon book. On average, customers save 80% to 90% compared to On Demand prices by using Spot instances

  • Videos from Ignite Velocity Amsterdam 2015 are now available. Baron Schwartz with a particularly moving talk.

  • A strangely attractive future is based on the nanomagnet: Researchers from the University of South Florida College of Engineering have proposed a new form of computing that uses circular nanomagnets to solve quadratic optimization problems orders of magnitude faster than that of a conventional computer.

  • If you want to search computer science there's a new search engine out there: Semantic Scholar. I can't vouch for the quality, but it's dang fast.

  • A rubric of When To Say No When Growing Your Tech Stack. Very Low Risk: Local Developer Utilities. Moderate Risk 1: Deployment Infrastructure. Moderate Risk 2: Programming Language. Serious Risk: Database.

  • Why We Left PaaS and What We Did Instead. Sometimes when adopting PaaS there are just too many compromises. It takes control over your project and you lose flexibility. This is the situation Cycligent found themselves. So they popped an abstraction level and fell back to IaaS. In the process they also moved clouds, from Azure PaaS to AWS IaaS. 

  • Reducing system jitter - part 2. As good as Part 1. 

  • newstore direction: I agree that moving newStore to raw block is going to be a significant development effort. But the current scheme of using a KV store combined with a normal file system is always going to be problematic (FileStoreor NewStore)...Internally at Sandisk, we have a KV store that is optimized for flash (it's called ZetaScale). We have extended it with a raw block allocator just as Sage is now proposing to do. Our internal performance measurements show a significant advantage over the current NewStore. 

  • Succinct Spark: Queries on Compressed RDDs: Succinct Spark enables search, count, range and random access queries on compressed RDDs...2.75x faster than ElasticSearch for search queries while requiring 2.5x lower storage, and over 75x faster than native Spark.

  • When something is so complicated that people don't really understand it, it becomes easy to invest all your hopes and dreams into a dream that can't possibly deliver. It's All About the Blockchain

  • Hello World: Meet CockroachDB, the Resilient SQL Databaseblhack: This is un. f*cking. real. I work in the IoT space (specifically on health devices), and something like this would be absolutely a GOD SEND for the things that we're doing. The current "hottness" in our space is bluetooth low energy (BLE). This is super low power (or, rather: chips that are really good at going to deep sleep, and then coming online very quickly to burst some data out), but the range on BLE isn't great.

  • Redis HyperLogLog and KMinHash performance: HLL is superior to KMinHash based implementations by a reasonable margin from a performance perspective, which is expected given that it is a highly optimized implementation in the Redis server. However, for the accuracy gains of KMinHash, the penalty doesn’t seem too high.

  • Videos are available from Cassandra Summit 2015.

  • Scaling Your GIS with Pivotal Greenplum: So if you find yourself in a situation where your data needs are exceeding a single instance Postgres database, but you still wish to use a SQL-based relational database, I encourage you to consider Greenplum as an alternative.

  • How New Long-Range Radios Will Change the Internet of Things: A new category of radios has emerged which breaks this paradigm, delivering both miles of range and years of battery life. Several different standards are hitting the market now, with LoRa, Sigfox, and Ingenu in the lead. Collectively, these radios are known as Low-Power Wide Area Network radios, or LPWAN.

  • When our components evolve to be more autonomous and self-organizing, it could get really weird. Study Spells Out Why Some Insects Kill Their Mothers: Workers are assessing the situation in their colony and deciding to revolt against the queen only when the genetic makeup of the colony makes it favorable to do so...Workers are not mindless automatons working for the queen no matter what.  They only altruistically give up reproduction when the context is right, but revolt when it benefits them to do so.

  • Building systems for massive scale data applications: Once you have a streaming system or streaming execution engine that gives you this automatic-scaling, like Dataflow does, and it gives you consistency and strong tools for working with your data, then people start to build these really complicated services on them. It may not just be data processing. It actually becomes a nice platform for orchestrating events or orchestrating distributed state machines and things like that. We have a lot of users internally doing this stuff. Also, The world beyond batch: Streaming 101

  • Making the world faster with multicores. Linux TCP will have lockless listener processing 3.5M SYNs per sec. zokier: "Two orders of magnitude faster and half the code size." But it's never enough. api: I'd like to see them fix the other scalability problems: hard coded maximum numbers of interfaces, socket API scaling issues, etc. You should be able to open as many sockets as you have RAM and push as much data as you have CPU to do so. A huge machine should be able to do fifty million TCP connections.

  • Intel Shows Off 3D XPoint Memory Performance. Slower than RAM. Faster than SSD. Denser than RAM. Lower latencies and jitter than SSDs. 4 and 7 times more IOPS compared to SSD. What about price? So it still fits between hard drives and RAM, which is where SSDs sit, so there's some confusion about what the use case is. It doesn't collapse the memory hierarchy. Great discussion on reddit.

  • Papers are available for the The 2015 Internet Measurement Conference (IMC). Lots of cool topics on Measurements of Security, Attacks, and Fraud; Traffic and Routing; Search and ads; Bots and Clouds; Trust & Security; Mobile; Measurement and/or platforms; WiFi and Mobile; What’s in a name?; Measurement and/or platforms deux; China and Cuba; Analyses.

  • Interesting intersection of infrastructure investment at the national and corporate levels. Damn Right Amazon Runs a F*cking Deficit and So Should America: Amazon invests in its own enterprise to stay competitive. If they were to try and balance their budget, it would mean cutting the lifeblood that is ensuring the only thing about themselves that people are actually invested in seeing succeed: its own future.

  • Here's a list of Software Testing Conferences.

  • Which Push Notification Tool Should You Use? Good analysis of many different options.

  • Don’t Assume PostgreSQL is Slow. You may not need as much Redis as you think. Good discussion on HN.


  • You might like this free book on Dataflow and Reactive Programming Systems.

  • Great article on understanding the core components of Cloud Foundry. There's a lot of parts, but they are well described as are their relationships.

  • Kind of PRish, but useful. Halo 5 uses Azure DocumentDB. Alfresco uses Aurora: The system completed more than 15 million transactions, with a load-rate of 1200/s, 80% DB CPU load in bulk load, and Aurora’s indexes worked efficiently at 3.2TB. There were no size-related bottlenecks and John assured his audience that the very same infrastructure could sustain up to 20 billion documents.

  • Many interesting takes by Henning Brauer in a forecast on OpenBSD's future: I am very sure That OpenBSD will stay OpenBSD and not allow ie a large corporate user is steer it... With IT as a whole becoming more mature some of the "written in stone" rules count less and less, some will become irrelevant...The widely deployed line of thought that you 'set up a bunch of machines for A Certain task and re-do everything a couple of years later Vanishes; the need to replace hardware frequently is about to vanish (where it has not already) since you do not need to buy new hardware every couple of years due a performance Requirements - in many areas, at least. The trend towards virtualization That is part of a current entry-level server is way to beefy for many many many tasks, so one way to make efficient use of its resources is virtualization. That I predict we'll see setups being around MUCH longer than what we're used to today. The virtualization has another side-effect - shifting VMs from one host is another is trivial, while today it is pretty common for fresh installs it on new hardware when the hardware is changed. We'll see much longer-living installs not just due a longer hardware life, we'll see That is a whole new level with VMs That just get shifted to new / other hosts / platforms. That makes maintenance a much bigger deal, and the system That is totally cluttered after a few years, with nobody being able to really grok what's going on is an Increasing problem with age of the installation...The Increasing use of commodity IT equipment In These more or less embedded roles where we typically do not see an administrator of the team dealing with software upgrades etc has a lot of Consequences.

  • Aleatory Architectures: Aleatory architectures explore new approaches and concepts at the intersection of granular materials research and architecture/structural engineering. It explicitly includes stochastic (re-) configuration of individual structural elements and suggests that building materials and components can have their own agency - that they can be designed to adapt and to find their own responses to structural or spatial contexts. 

  • Pelican: A Building Block for Exascale Cold Data Storage: Resource right-provisioning in Pelican means only 8% of the drives can be concurrently spinning. This introduces complex resource management to be handled by the Pelican storage stack. Resource restrictions are expressed as constraints over the hard drives. The data layout and IO scheduling ensures that these constraints are not violated.

  • worldengine: World generator using simulation of plates, rain shadow, erosion, etc.

  • OOSMOS ( Object Oriented State Machine Operating System): an open source implementation of threadless concurrency for C/C++. The portable, single-source file implementation makes it easy to integrate into any environment – from bare boards to mainframes.

  • memdb: Fast in memory data access, up to 25,000 ops (single doc read/write) per shard (each shard take one CPU core).

If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.