Stuff The Internet Says On Scalability For February 5th, 2016


We have an early entry for the best vacation photo of the century. If you like this sort of Stuff then please consider offering your support on Patreon.

  • 1 billion: WhatsApp users; 3.5 billion: Facebook users in 2030; $3.5 billion: art sold online; $150 billion: China's budget for making chips; 37.5MB: DNA information in a single sperm; 

  • Quotable Quotes:
    • @jeffiel: "But seriously developers, trust us next time your needs temporarily overlap our strategic interests. And here's a t-shirt."
    • @feross: Modern websites are the epitome of inefficiency. Using giant multi-MB javascript files to do what static HTML could do in 1999.
    • Rob Joyce (NSA): We put the time in …to know [that network] better than the people who designed it and the people who are securing it,' he said. 'You know the technologies you intended to use in that network. We know the technologies that are actually in use in that network. Subtle difference. You'd be surprised about the things that are running on a network vs. the things that you think are supposed to be there.
    • @MikeIsaac: i just realized how awkward Facebook's f8 conference is gonna be this year
    • @Nick_Craver: Stats correction: Stack Overflow did 157,370,800,409 redis ops in the past 30 days, almost always under 2% CPU:
    • @BenedictEvans: The global SMS system does around 20bn messages a day. WhatsApp is now doing 42bn. With 57 engineers.
    • @jaygoldberg: WhatsApp has the benefit of running on top of the world's data networks which employ a few more engineers... 
    • @anildash: It’s odd that developers think Twitter is so hostile while Facebook shuts down stuff like Parse & FBML + cuts back the Instagram & FB APIs.
    • @asynchio:  I use to think CEP = stateful business rules engine + inference + stream processing. Has it changed?
    • @Marco_Rasp: "SOA is about reuse, MicroServices about time to market." @samnewman #microxchg
    • @pfhllnts: "I predict quantum containers where Docker exists both inside and outside a container." @marcoceppi #fosdem
    • @viktorklang: Awesome story: 295x speedup with Akka Streams on same HW compared to Rails :) 
    • krinchan: Yes. Because a currency almost completely controlled by Chinese miners who are strangling the network at 1MB blocks, causing transaction times in excess of three hours at peak and just introduced the ability to arbitrarily reverse those transactions during the lag is totally going to handle DraftKings and FanDuel.
    • @mpesce: 1/The Apple AX series SOCs are more than powerful enough to run a Hololens-type device very effectively.
    • Matthew Yglesias: Amazon's leadership, from CEO Jeff Bezos on down, are deliberately redeploying every dollar of revenue Amazon earns into making the company bigger and bigger.
    • German forest ranger finds that trees have social networks: trees operate less like individuals and more as communal beings. Working together in networks and sharing resources, they increase their resistance to threats
    • @ValaAfshar: 11 years ago some guy named Mark Zuckerberg talks about his new company. He is now 4th richest person in the world. 
    • Bernard Marr: In China, the government is rolling out a social credit score that aggregates not only a citizen’s financial worthiness, but also how patriotic he or she is, what they post on social media, and who they socialize with
    • @Carnage4Life: Facebook is valued at $326 billion and worth more than Exxon Mobil. Remember when people freaked out at $15B value? 
    • @Nick_Craver: High levels of efficiency at scale aren't one thing; it's a thousand things. Many we haven't really shared in detail...and we should.
    • 2BuellerBells: Things to reinvent: Event loops (done!) Unix (In progress!) Erlang (est. 5 years)
    • @LusciousPear: I'm consistently seeing GETs from @googlecloud storage 2-5x faster than S3. niiiice
    • Kevin Old: The future looks mighty scalable.
    • @BenedictEvans: All curation grows until it requires search. All search grows until it requires curation.
    • @Carnage4Life: Google has 7 services with 1B monthly active users; Gmail, Search, Chrome, Android, Maps, YouTube and Google Play 
    • @jmhodges: That's 1.3 million unique domains in a single day. Yesterday. Let's Encrypt is doing a thing.
    • @danielbryantuk: "60% percent of app users rate performance/response time ahead of features" @grabnerandi  #OOP2016 
    • @tdeekens: Sometimes Monoliths don’t get enough respect. They’re part of our revenue system allowing us to build Microservices. They gave us a business
    • Searching for the Algorithms Underlying Life: Valiant’s self-stated goal is to find “mathematical definitions of learning and evolution which can address all ways in which information can get into systems.” If successful, the resulting “theory of everything”...would literally fuse life science and computer science together.
    • @mountain_ghosts: 1995: the information superhighway will mean anyone can do anything from anywhere 2015: must be willing to relocate to San Francisco

  • Fingerprinting made burglars put on gloves. CCTV made kids pull their hoods up. Spying made honest people use encryption. Forensics: What Bugs, Burns, Prints, DNA and More Tell Us About Crime.

  • So that's what bandwidth means. ucaetano: The bandwidth doesn't depend on the frequency you're occupying, but on the amount of spectrum available: you "usually" get in the order of 1 bps for every Hz of spectrum available for mobile: a 20Mz chunk of spectrum will give you ~20Mbps, no matter if it is 700MHz or 5 GHz. Higher frequencies have awful penetration and range, that's why today you define who wins in the mobile game by the amount of 700MHz and 800MHz spectrum they own. In other words, lower frequency spectrum is (within certain limits) always better.

  • Even spies have limits. Optic Nerve: millions of Yahoo webcam images intercepted by GCHQ. A British surveillance agency suffered the indignity of only saving images every five minutes from user feeds to reduce server load. My kingdom for a cloud! Why? They needed data to train their face recognition algorithms. That's what happens if you aren't Google.

  • This is brutal, but Linode handled the attack as well as one can. The Twelve Days of Crisis – A Retrospective on Linode’s Holiday DDoS Attacks: "Linode saw more than a hundred denial-of-service attacks against every major part of our infrastructure, some severely disrupting service for hundreds of thousands of Linode customers." Lessons: don’t depend on middlemen; absorb larger attacks; let customers know what’s happening. What they did: nameservers are now protected by Cloudflare...our websites are now protected by powerful commercial traffic scrubbing appliances...Linode will be overhauling our entire datacenter connectivity strategy, backhauling 200 gigabits of transit and peering capacity...emergency mitigation techniques put in place during these holiday attacks have been made permanent.

  • Will someone rid me of these damn security nightmare IoT devices? Steve Gibson in Three Dumb Routers describes how and why to make a secure network of all these new oh so hackable IoT (Internet of Threat) devices. It requires segmenting your network, keeping the IoT devices on their own network. Otherwise they can probe your IP devices or perform ARP spoofing attacks. Which means that devices have to communicate by routing through the public internet to talk to each other. Which means the dream of local cooperating unikernel powered cloudlets is suspect. 

  • Shmoocon 2016 videos are available.

  • Google plans to beam 5G internet from solar drones. You don't say? How Would You Build The Next Internet? Loons, Drones, Copters, Satellites, Or Something Else?

  • What exactly are the limits of your BaaS? BaaS at Scale: it's critically important to be mindful of how data flows between your app and your BaaS...On a recent project, we reduced data usage by 90% simply by subscribing clients to data updates more granularly...run a series of load tests with an increasingly large load. For a recent app, we started of running our load test on a single C3.8xlarge instance (~1,250 concurrent connections). Then we moved to 5 instances (~6,250 concurrent connections). Then finally 10 instances (~12,500 concurrent connections). Plotting the results confirmed our understanding that our app scaled linearly.

  • The IRS experienced a hardware failure and was down for a while. Update: IRS Systems are Operating. It happens. An interesting architecture aspect is that tax returns could still be queued through 3rd parties like Intuit. The main system being down didn't stop work from being submitted into the system which means work was not accidently lost. Lessons: define work as a document that can flow between processing nodes; keep the queueing systems completely separate from the processing systems so they can't have dependent failures.

  • There's more than one way to do it and that way changes as your situation changes. Moving Past the Scaling Myth: With realistic expectations we can plan and choose a new ways of organizing our code and ourselves to fit the scales we encounter, and find ways of transitioning between them as we grow. There's a good chance that would be better a better approach than trying to extend a structure or even a set of principles to fit all scales.

  • Can CockroachDB, a database that values consistency and scaling first, be faster at being Redis than Redis, a single threaded database with no synchronous replication to slave nodes? Nope. But it's closer than you might think. A Redis API over CockroachDB was between 10 and 20 times slower. Could CockroachDB Ever Replace Redis? A Free Fridays Experiment.

  • Iron.io merges unikernels with containers in the idea of Microcontainers – Tiny, Portable Docker Containers: A Microcontainer contains only the OS libraries and language dependencies required to run an application and the application itself. Nothing more...Node.js and its dependencies, it comes out to 29MB. A full 22 times smaller!

  • That whole spending a lot of money to grow wildly and gather all market share to yourself can actually work. Airbnb CTO and 3 Tech CEOs Discuss the Digital Platform Economy at Davos: This is what people mean by investing. Airbnb’s Blecharczyk then put those levels of platforms into context from a real world perspective. He began by saying that Airbnb has booked 70 million guests into strangers’ homes since 2008, with 50 million of those happening in 2015. Meaning, Airbnb booked more business last year than the previous seven years combined.

  • There's a new film, The Human Face of Big Data, that might be of interest. It "explores how the visualization of data streaming in from satellites, billions of sensors and smart phones is beginning to enable us, as individuals and collectively as a society, to sense, measure and understand aspects of our existence in ways never possible before." No idea if it's any good or not. We need more data to tell.

  • Years of effort by a team of geniuses can make garbage collection work. Fast C10M: MigratoryData running on Zing JVM: this new C10M benchmark demonstrates that MigratoryData Server running on a single 1U machine can handle 10 million concurrent clients each receiving a 512-byte message per minute (at a total bandwidth of 0.8 Gbps) with a consistent end-to-end latency of under 15 milliseconds.

  • This was always part of Captain Nemo's business plan. Microsoft Plumbs Ocean’s Depths to Test Underwater Data Center: The underwater server containers could also help make web services work faster. Much of the world’s population now lives in urban centers close to oceans but far away from data centers usually built in out-of-the-way places with lots of room. The ability to place computing power near users lowers the delay, or latency, people experience, which is a big issue for web users.

  • Out with the old and in with the new. In this case the old is only 46 years old. AI Is Transforming Google Search. The Rest of the Web Is Next: Increasingly, we’re discovering that if we can learn things rather than writing code, we can scale these things much better.

  • Internal tools should be sold or killed: "I propose a somewhat radical line of thinking for evaluating whether to build an internal project: if we are to build it, we will sell it as an external product -- or we will not build it all and kill it." Interesting idea, but it assumes organizations are alike, a tool is custom built to make a particular organization more efficient, the more specialized the better. You don't build a club and try to sell it to people who need hammers.

  • Ticketmaster on Implementing a DevOps Strategy across multiple locations & product teams. The core tooling: Git, GitLab, Jenkins, SonarQube, Sonatype Nexus, Rundeck, Octopus Deploy, Chef.

  • FloCon 2015 Presentations are available.

  • There's a difference between Mr. Right and Mr. Right Now. The Wrong Abstraction: The moral of this story? Don't get trapped by the sunk cost fallacy. If you find yourself passing parameters and adding conditional paths through shared code, the abstraction is incorrect. It may have been right to begin with, but that day has passed. Once an abstraction is proved wrong the best strategy is to re-introduce duplication and let it show you what's right.

  • Speaking of spies, this is why we can't have back doors. Government software may have let in foreign spies: The government may have used compromised software for up to three years, exposing national security secrets to foreign spies...Observers increasingly believe the software defect derived from an encryption “back door” created by the National Security Agency (NSA). Foreign hackers likely repurposed it for their own snooping needs.

  • An Ansible it is then. Fastest Light Pulses Show Electrons Are Sluggish: Electrons move fast, especially within an atom. But they have their limits, and those limits might put a top speed on future optoelectronic circuits. 

  • It's amazing how much even the "simple" things you take for granted can be improved when you focus on them. Large scale image processing on the fly: Once Imagizer instance receives an image, it can transcode all images into different resolutions pretty much in parallel. We can do it in 25 milliseconds or less. It doesn't matter how large the initial image is because the way we've written our algorithms...M3 medium will do 50 to 75 conversions per second...25 milliseconds or less per image...the largest EC2 would do about 2.5 thousand conversions per second. That's a lot of data. It wouldn't be limited by the CPU but by the I/O.

  • The cellular network is not as secure as you might think. LTE security and protocol exploits. LTE still exchanges a significant amount of cleartext information to arbitrary access points. An open-source LTE stack can build on less than $2000 worth of hardware. SIM IMSIs are transmitted in plaintext to unrecognized (potentially rogue) LTE APs. A rogue LTE AP can deny service to a SIM for 24-48 hours at a time. A rogue LTE AP can force a device to use GSM, which is known to be weak. Signals to and from an LTE AP can be monitored to determine which devices are attached, i.e. location data is leaked.

  • A good list. Creating a Microservice? Answer these 10 Questions First: How will it be tested?  How will it be configured? How will it be consumed by other parts of the system?  How will it be secured?  How will it be discovered?  How will it scale with increasing load?  How will it handle failures of its dependencies? How will the rest of the system handle the failure of the new microservice? How will it be upgraded? How will it be monitored and measured?

  • In which we see how the datacenter becomes scriptable. CloudWatch + Lambda Case 4: Control launch of Specific “C” type EC2 instances post office hours to save costs: Whenever a Instance type is launched it will trigger a lambda function, the function will filter whether it is a specific “C” type and check for the current time, if the time falls after office hours, it will terminate the EC2 instance launched immediately.

  • Net ring-buffers are essential to an OS: Modern network machines, whether web servers or firewalls, have two parts: the control-plane where you SSH into the box and manage it, and the data-plane, which delivers high-throughput data through the box. These things have different needs. Unix was originally designed to be a control-plane system for network switches. Trying to make it into a data-plane system is 30 years out of date. The idea persists because of the clueless thinking as expressed by the OpenBSD engineers above.

  • This is something you don't often read about. Facebook is talking in-depth down and dirty about Hardware and firmware attacks: Defending, detecting, and responding. Very interesting stuff.

  • Flexible figures: The practice [dynamic pricing] is spreading to physical retailers, which are installing electronic price displays and borrowing pricing models from e-retailers. Kohl’s, with nearly 1,200 stores in America, now holds sales that last for hours rather than days, pinpointing the brief periods when discounts are most needed. Cintra, a Spanish infrastructure firm, has opened several toll roads in Texas that change prices every five minutes.

  • Exploring gambles reveals foundational difficulty behind economic theory: "The first perspective—considering all parallel worlds—is the one adopted by mainstream economics," explained Gell-Mann. "The second perspective—what happens in our world across time—is the one we explore and that hasn't been fully appreciated in economics so far."

  • Excellent Report from SODA, ALENEX, and ANALCO, The 27th ACM-SIAM Symposium on Discrete Algorithms. 

  • gophergala2016/meshbird: create distributed private networking between servers, containers, virtual machines and any computers in different datacenters, different countries, different cloud providers. All traffic transmit directly to recipient peer without passing any gateways. Meshbird do not require any centralized servers. Meshbird is absolutly decentralized distributed private networking.

  • Optical network democratization: This paper addresses this problem by proposing a completely democratized optical network infrastructure. It introduces the novel concepts of the optical white box and bare metal optical switch as key technology enablers for democratizing optical networks.

  • varnish/hitch: Hitch, a scalable, open source network proxy designed to efficiently handle tens of thousands of connections on multicore machines. Hitch is easy to configure, has a low memory footprint, and is the ideal way of terminating client-side SSL/TLS for Varnish.

  • preshing/junction:  a library of concurrent data structures in C++. It contains three hash map implementations.