« Master-Master Replication and Scaling of an Application between Each of the IoT Devices and the Cloud | Main | Sponsored Post: Contentful, Stream, Loupe, New York Times, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp »

Stuff The Internet Says On Scalability For January 20th, 2017

Hey, it's HighScalability time:


Absolutely. Do we agree that the cerebellum is amazingly beautiful? (@PeppeGanga)

If you like this sort of Stuff then please support me on Patreon.

  • 900 GB: data stolen in Cellebrite hack; 99.24%: users identified by cross-browser fingerprinting; 72%: intend to migrate to a hybrid cloud; 90%: Google & Facebook ad traffic is useless; 5.2 terabytes per second: data from Australian Square Kilometre Array Pathfinder; 10 billion: searches on DuckDuckGo in 2016; $330m: Amazon's loss on Alexa; 

  • Quotable Quotes:
    • @brucel: Breaking: Programmer accused of writing unreadable code refuses to comment.
    • @asymco: Remember Android first? App Annie believes the Apple’s App Store produced about twice as much revenue as Google Play
    • @bridgetkromhout: Describing your old-timer ranting as "greybeard" just makes me want to fight you with sed & awk at twenty paces. Be there tomorrow at dawn.
    • @StevenShorrock: Root Cause Analysis is: * Acceptable for simple systems * Inappropriate for complicated systems * Ludicrous for complex systems
    • @swardley: Five years ago Amazon was worth about half of Walmart, today Walmart is worth about half of Amazon.
    • @CaseyNewton: “Uber claimed median annual driver income was more than $90,000 in New York. Less than 10% of drivers in those areas made that much.”
    • Eric Raymond: In practice, I found Rust painful to the point of unusability. The learning curve was far worse than I expected; it took me those four days of struggling with inadequate documentation to write 67 lines of wrapper code for the server.
    • @swardley: past history shows many major players won't announce they're getting into the battle until some time after war has ended
    • @benthompson: Apple wasn't billed as phone maker / Amazon wasn't billed as infrastructure provider / FB wasn't billed as portal / Snapchat wasn't billed as TV
    • Jessitron: the biggest consideration in choosing whether to use libraries or services for distribution of effort / modularization is that choice of who decides when it deploys. Who controls which code is in production at a given time.
    • Hi Ben: The disruption of TV will follow a similar path: a different category will provide better live sports, better story-telling, or better escapism. Said category will steal attention, and when TV no longer commands enough attention of enough people, the entire edifice will collapse. Suddenly.
    • @leonidasfromxiv: I also don't understand why people compare Go with Rust. If you need a GC-less programming language: Rust; if you need a board game: Go.
    • Carlo Rovelli: The world isn’t just a mass of colliding atoms; it is also a web of correlations between sets of atoms, a network of reciprocal physical information between physical systems.
    • Chris Dixon: In the beginning, hardware-focused companies make gadgets with ever increasing laundry lists of features. Then a company with strong software expertise (often a new market entrant) comes along that replaces these feature-packed gadgets with full-fledged computers. 
    • Animats: The real question is "what do we do with a lot of CPUs without shared memory?" Such hardware has been built many times - Thinking Machines, Ncube, the PS2's Cell - and has not been too useful for general purpose computing.
    • @taavet: Very unfortunate that incumbents see tech only as a way to cut costs. Versus seeing tech to offer much better products.
    • NelsonMinar: This is what security looks like when your threat model is well funded government agencies.
    • Don Norman: The solution requires a different approach to the design of automation: collaboration. Instead of automating what can be automated, leaving the rest to the driver, we must develop collaborative systems so that the driver is continually involved in giving high-level guidance, thereby always staying active, always being in the loop. 
    • Thomas Frey: It took 50 years for the world to install the first million industrial robots. The next million will take only eight. Will this cause more jobs or few jobs in the future? I'm not convinced we know the answer.
    • @jtauber: "Every shot in Piper is composed of millions of grains of sand, each one of them around 5000 polygons."
    • rackforms: my point is the current situation, basically 2 companies controlling so much traffic, seems, well, bad for small business in this country. I value what they bring to the table and fully understand why they're so popular. But is things keep on this way where does that lead the guys like me? Is this just the way it has to be? Is the dream of the open Internet already dead?
    • @sheeshee: I think I know why it's called "DevOps" - "DevOops" was too obvious... ;)
    • greenspot: The open solution to a faster mobile web would have been so easy: Just penalize large and slow web pages without defining a dedicated mobile specification. That's it. This wasn't done in the past, slow pages outperformed fast ones on the SERPs because of some weird Google voodoo ranking, heck sometimes even desktop sites outperformed responsive ones on smartphones. If they had just tweaked these odd ranking rules in way that speed and size got more impact on the overall ranking there wouldn't have been any reason for AMP—the market would have regulated itself.
    • Juergen Schmidhuber: General purpose quantum computation won’t work (my prediction of 15 years ago is still standing). Related: The universe is deterministic, and the most efficient program that computes its entire history is short and fast, which means there is little room for true randomness, which is very expensive to compute. What looks random must be pseudorandom, like the decimal expansion of Pi, which is computable by a short program. Many physicists disagree, but Einstein was right: no dice. There is no physical evidence to the contrary

  • RethinkDB is shutting down and here's the post-portem. Lessons: the database market is like Mad Max fighting in the Thunderdome; it's better to optimize for useless microbenchmarks than it is to be good; optimism isn't a strategy.

  • Apple isn't alone in using custom hardware to thwart nation state level attackers. Google Infrastructure Security Design Overview. Good overview at Google reveals its servers all contain custom security silicon. Google designs "custom chips, including a hardware security chip that is currently being deployed on both servers and peripherals. These chips allow us to securely identify and authenticate legitimate Google devices at the hardware level."  Google encrypts data before it is written to disk, to make it harder for malicious disk firmware to access data. Google uses automated and manual code review techniques. Google uses automated software and code reviews to detect bugs in software its developers write. Google scans user-installed apps, downloads, browser extensions, and content browsed from the web for suitability on corp clients. Google uses a custom version of the KVMhypervisor. Good discussion on HackerNews, where a lot of the comments are on how Google needs this level sophistication to evade the prying eyes of governments.

  • What happens when you embed machine learning into a DBMS in order to continuously optimise its runtime performance? You get Self-driving database management systems. Humans suck at tuning databases so this is just one more job AIs will eventually toss into the dust bin of history. TensorFlow was integrated inside Peleton training two RNNs on 52 million queries from one month of traffic for a popular site. Does it help?: early results are promising: (1) RNNs accurately predict the expected arrival rate of queries. (2) hardware-accelerated training has a minor impact on the DBMS’s CPU and memory resources, and (3) the system deploys actions without slowing down the application. 

  • If you are an Apple developer, interested in LLVM, or interested in the evolution of Swift, you'll love this ATP interview (205: PEOPLE DON'T USE THE WEIRD PARTS) with Chris Lattner. Unlike a lot of projects Swift went full open source. The code is open, the community is open, and the design is open. The goal is to have Swift become world dominant, to be more popular than Java, maybe even more popular than C. Becoming open source was part of the process. The killer app driving Swift's initial growth is iOS development. But it needs to go beyond iOS. Server development is the next step. After that is for Swift to be used as a scripting language, borging the best parts of Perl and other languages. After that the big new frontier is for Swift to be used as a system programming language, capable of being used for kernel dev. The type annotation approach to enable low level systems programming doesn't seem like it will work. If you are creating a kernel to support 128 cores you don't want the language to have to support you before you can tune all the low level fidly bits, you want the language to get out of the way and not solve all your problems. There's also an extended and sensible discussion as to why Swift chose ARC instead of garbage collection.

  • This is one hot article. An Inferno on the Head of a Pin. Jeff Atwood, purveyor of great detail and whimsy, ran into a problem on his way to building a 6 core 1U server: heat, lots and lots of heat. A copper heat sink helped, attaching the heat sink with a fancy "Ceramique" thermal compound helped even more, and adding a fan duct allowed the server to remain stable overnight with a full MPrime run of 12 threads.

  • It's OK to be paranoid, your devices are out to get you. Cartapping: How Feds Have Spied On Connected Cars For 15 Years. You can be tracked and listened to with OnStar of course. Third-party factory-installed GPS tracking devices can be turned on. Your car can be turned off remotely. 

  • If you are thinking of making a game then this might be of use. 7 Things We Learned About Primary Gaming Motivations From Over 250,000 Gamers: 1) The Most Common Primary Motivations for Men are Competition and Destruction. 2) The Most Common Primary Motivations for Women are Completion and Fantasy. 3) Women are More Polarized in Terms of What They Care About in Gaming. 4) For Non-Binary Gender Gamers, Fantasy and Design are Most Common Primary Motivations. And Their Preferences are Even More Polarized. 5) For Young Gamers, Competition is Most Popular, and It’s Almost 50% More Frequent Than the Next Most Popular Motivation (Destruction). 6) Among 36+ Gamers, Competition Drops from 1st to 9th Place. Fantasy and Completion are Most Common. 7) Completion is the Most Low-Risk, High-Reward Motivation. For more information here's a paper on their Gamer Motivation Model.

  • Insights for your product development. Amazon is essentially a habit machine is one lesson from Customer Loyalty Is Overrated. Loyalty is really a habit, doing what is familiar and comfortable, that's what humans do and have always done. Habit is not an expression of loyalty. So you don't have to perpetually change in response to a fast changing world. Customers stay with services that they are comfortable and familiar with. You want to: win the early popularity contest, this allows you to accumulate comfort. Freemiun is popular because it builds a habit. You want customers now and keep them by designing for habit. Make your product as easy as possible to use. Innovate carefully within the brand. Relaunch is scary to the consumer. Don't change too much too fast. Change but change so current habits are kept. Change in a way that changes habits least. Keep communications simple. Communication is with the subconscious mind. Show your product how you want it used. An example is the beer commercial showing people walking up to a bar and ordering a beer. It tells your subconscious mind to do the thing shown on the screen.

  • Martin Zinkevich has created a detailed set of 43 Rules of Machine Learning: Best Practices for ML Engineering. Rule #1: Don’t be afraid to launch a product without machine learning. Rule #4: Keep the first model simple and get the infrastructure right. Rule #43: Your friends tend to be the same across different products. Your interests tend not to be.

  • The Impact of Swapping on MySQL Performance. Assuming your swap is SSD, it may not be as bad as you think: When I started, I expected severe performance drop even with very minor swapping. I surprised myself by getting swap activity to more than 100MB/sec, with performance “only” halved.   While you should continue to plan your capacity so that there is no constant swapping on the database system, these results show that a few MB/sec of swapping activity it is not going to have a catastrophic impact.

  • Honeycomb shows how looking at worst-case performance can improve your system. Instrumentation: Worst case performance matters. They found a common a problem that happens when using threads. Processing an unexpectedly large input blocks a thread while it works on the input. This spikes latency. You won't notice unless you look at the full latency distribution. So they capped the size of a certain datastructure "which cut the worst-case latency back to sane levels and largely eliminated the appearance of the thread stall problem."

  • Timely, nicely organized, with good content. Web security essentials - A crash course. Divided into topics: Sessions and cookies, Password storage, CORS, XSS, CSRF, SQL injection, Human Error and UI/UX design.

  • Applications used to contain an embedded database. Why not webapps? Execute millions of SQL statements in milliseconds in the browser with WebAssembly and Web Workers that shows it's possible run a million SQL select statements in two seconds. 

  • The dismal science strikes again. SSD/flash/memory shortage, day N+1: There has been a huge demand of SSD/Flash/memory components from a number of end users...Supply is highly constrained, while demand is rising... It does take some time for manufacturing to ramp up, and OEMs are in no hurry to flood a market and lower the effective purchase price (and their profits).

  • Kristen Stewart (yes, that Kristen Stewart) just released a research paper on artificial intelligence. Using neural networks to transfer style from one picture to another. What struck one reader about the paper they mention is this : they used g2.2xlarge EC2 instances to run their program and could process 1024px wide images in 40 minutes per instance.

  • Hootsuite shares a rich description of their experience Accelerating cross platform development with serverless microservices. Lots of good details about their process. While performance was acceptable they hid latency from calling to the cloud by calling Lambda earlier in the workflow so the result is already cached when the user required it. Both Android and iOS we were able to leverage the same Lambda functions. And something unexpected: they were able to outsource the iOS version to a third party without having to share the intellectual property encoded within Lambda.

  • RAISR is Google's new machine learning driven compression method that uses up to 75 percent less bandwidth per the 1 the billion images per week they apply it to. Here's the paper: RAISR: Rapid and Accurate Image Super Resolution

  • cmu-db/peloton: Peloton is a self-driving SQL database management system. What is needed for a truly “self-driving” database management system (DBMS) is a new architecture that is designed for autonomous operation. Peloton is a relational database management system designed for fully autonomous optimization of hybrid workloads. 

  • real-logic/Agrona: Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java. Many of these utilities are used in the Aeron efficient reliable UDP unicast, multicast, and IPC message transport and provides high-performance buffer implementations to support the Simple Binary Encoding Message Codec.

  • microscaling/microscaling:  provides automation, resilience and efficiency for microservice architectures. Microscaling Engine will integrate with all the popular container schedulers. Currently we support: Docker API, Marathon, Kubernetes

  • IDSIA/sacred: Sacred is a tool to help you configure, organize, log and reproduce experiments. It is designed to do all the tedious overhead work that you need to do around your actual experiment 

  • Very thorough. Nicely done. Is Parallel Programming Hard, And, If So, What Can You Do About It?: The purpose of this book is to help you program shared memory parallel machines without risking your sanity. We hope that this book’s design principles will help you avoid at least some parallel-programming pitfalls.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>