Stuff The Internet Says On Scalability For September 26th, 2014

Hey, it's HighScalability time:


With tensegrity landing balls we'll be the coolest aliens to ever land on Mars.

  • 6-8Tbps:  Apple’s live video stream; $65B: crowdfunding's contribution to the global economy
  • Quotable Quotes:
    • @bodil: I asked @richhickey and he said "a transducer is just a pre-fused Kleisli arrows in the list monad." #strangeloop
    • @lusis: If you couldn’t handle runit maybe you shouldn’t be f*cking with systemd. You’ll shoot your g*ddamn foot off.
    • Rob Neely: Programming model stability + Regular advances in realized performance = scientific discovery through computation
    • @BenedictEvans: Maybe 5bn PCs have been sold so far. And 17bn mobile phones.
    • @xaprb: "There's no word for the opposite of synergy" @jasonh at #surgecon

  • The SSD Endurance Experiment. The good news: You don't have to worry about writing a lot of data to SSDs anymore. That bad news: When your SSD does die your data may not be safe. Good discussion on Hacker News.

  • Don't have a lot of money? Don't worry. Being cheap can actually create cool: Teleportation was used in Star Trek because the budget couldn't afford expensive shots of spaceships landing on different planets.

  • Not so crazy after all? Google’s Internet “Loon” Balloons Will Ring the Globe within a Year

  • Before cloud and after cloud as told through a car crash

  • Cluster around dear readers, videos from MesosCon 2014 are now available.

  • From Backbone To React: Our Experience Scaling a Web Application. This seems a lot like the approach Facebook uses in their Android apps. As things get complex move the logic to a top level centralized manager and then distribute changes down to components that are not incrementally changed, they are replaced entirely.

  • Deciding between GAE or EC2? This might help: Running a website: Google App Engine vs. Amazon EC2. AWS is hard to set up. Both give you a lot for free. GAE is not customizable. On AWS use whatever languages and software you want. GAE once written your software will scale. If you have a sysadmin or your project requires specific software go with AWS. If you are small or have a static site go with GAE. 

  • Mean vs Lamp – How Do They Stack Up? MEAN = MongoDB, Express.js, Angular.js, PHP or Python. Why be MEAN: the three most significant being a single language from top to bottom, flexibility in deployment platform, and enhanced speed in data retrieval. However, the switch is not without trade-offs; any existing code will either need to be rewritten in JavaScript or integrated into the new stack in a non-obvious manner.  

  • Free the Web: Sometimes, I feel like blaming money. When money comes into play, people start to fear. They fear losing their money, and they fear losing their visitors. And so they focus on making buttons easily clickable (which inevitably narrows down places where they can go), and they focus on making sites that are safe but predictably usable.

  • So, that flash stuff is fast. Hyperscale Databases: Testing Oracle NoSQL Using SanDisk CloudSpeed SSDs vs. HDDs: For Workload A (Update Heavy) SanDisk CloudSpeed SSDs provides a 23X performance advantage when compared to HDDs for 32GB data set size (dataset size fits in DRAM). As the data set size increased to 128GB (dataset size exceeds DRAM) it provides an even higher performance benefit, 31X that of HDDs. 

  • Great detailed Paper summary: ZooKeeper: Wait-free coordination for Internet-scale systems by Murat: In most places ZooKeeper is punting the ball to the clients. Yes, this is due to minimalistic design and such, but this burdens the clients to solve the transactional update themselves, and we know that this is error-prone. Maybe this is really the way to go. Or maybe this is the soft-belly of ZooKeeper and a big opportunity to provide a new coordination tool.

  • We may be entering the picene era of electronics: the emerging field of molecular electronics could take our definition of portable to the next level, enabling the construction of tiny circuits from molecular components. In these highly efficient devices, individual molecules would take on the roles currently played by comparatively bulky wires, resistors and transistors.

  • Good explanation of something everyone needs to do. How to Build Rate Limiting into Your Web App Login. Lots of PHP code, but it should be easy enough to translate into other languages. Agree with comments that say redis might be a better fit than MySQL.

  • What does it take to make indoor location tracking work? stevedc3 from Estimote lays it down: We have a team of data engineers in data science and PhD's measuring all the signals we receive from the beacons and performing algorithms (e.g. trilateration, least squared etc) and combining this with positioning signals we can process based on positioning of the device. The trick is to account for different devices, different antenna placement on models etc. So over time we build a database that improves accuracy based on usage. Regarding people being present, we can account for that if we predict that signals are reflecting differently based on estimated density. These are super challenging problems so we have a team dedicated to it, and iterate quickly.

  • Great look into how data is lost in filesystems. Or why file systems are really hard. File systems, Data Loss and ZFS. Reasons include: disk failures, Input Output Memory Management Unit or IOMMU, bit flips, reorders across flushes, CPU errata, software bugs, user error. 

  • Peter Bailis helps explain the very difficult to understand concepts of Linearizability versus Serializability: Linearizability is a guarantee about single operations on single objects. Serializability is a guarantee about transactions, or groups of one or more operations over one or more objects. One of the reasons these definitions are so confusing is that linearizability hails from the distributed systems and concurrent programming communities, and serializability comes from the database community. Today, almost everyone uses both distributed systems and databases, which often leads to overloaded terminology (e.g., “consistency,” “atomicity”). < Also from Peter: MSR Silicon Valley Systems Projects I Have Loved.

  • Function follows form. Here's how the brain implements prioritization...axons arise directly from dendrites, as scientists have discovered in a new cell shape: Rather than conduct signals via the neuron’s center, these unusual nerve cells take a shortcut to transmit information: The signals go around the cell like a bypass road. "Input signals at this dendrite do not need not be propagated across the cell body," explains study author Christian Thome of Heidelberg University in a news release. And they’re more efficiently forwarded than signals coming from elsewhere. 

  • Programming a 144-computer Chip to Minimize Power was brought forth by Chuck Moore. John Passaniti in a great comment tells why this tech is important: In the presentation, Mr. Moore made constant reference to the chip's extremely low power consumption. That matters when you're talking about systems that must run off batteries, photovoltaic cells, Peltier, or piezo-based power-- and must run for months or years. These are embedded systems that aren't plugged in the wall; systems that are out in the world or inside your body. So imagine a bridge monitoring system that is powered by the vibrations of cars running on it, or think of a programmable and adaptive vision system for blind people. One real-world application where this chip was reportedly used in was as the signal processing in a programmable hearing aid. Dumb hearing aids are just amplifiers that make things louder. Smart hearing aids do signal analysis and are aware of psycho-acoustics and the individual's specific hearing loss. Mr. Moore also referenced "smart dust" which you are invited to look up on Wikipedia. This chip is the kind of computing architecture that enables that.

  • Dell: How A Server Will Look Like In 2020: a two-socket 2U server will have 80 CPU cores, 12 TB of memory with a total bandwidth of 400 GB/s, 96 TB of disk capacity and total network bandwidth of 600 Gb/s. Servers of the future will need faster non-volatile memory too, to occupy the position between flash and DRAM. Such memory will have to be 50x-1000x faster than current generation flash, but offer around a hundred times more read-write cycles. < More in The FireBox Warehouse Scale Computer In 2020 Will Have 1K Sockets, 100K Cores, 100PB NV RAM, And A 4Pb/S Network.