hot links

Stuff The Internet Says On Scalability For April 29th, 2016

High Scalability

29 Apr 2016 — 12 min read

Hey, it's HighScalability time:

The Universe in one image (Pablo Budassi). Imagine an ancient being leaning over, desperately scrying to figure out what they have wrought.If you like this sort of Stuff then please consider offering your support on Patreon.

50 minutes: Facebook daily average use; 1.65 billion: Facebook Monthly active users; 25PB: size of Internet archive; 7 years: speedup of encryption adoption from the Snowden revelations; 10 million: strands of DNA Microsoft is buying to store data; 300TB: open data from CERN; 2PB: data from PanSTARRS' imaging survey; 100 billion: words translated by Google per day; 204 million: Weather Channel views in March on Facebook;

Quotable Quotes:
- @antevens: -> Describe your perfect date. ......<- YYYY-MM-DD HH:MM:SS.XXXXXX
- @ValaAfshar: 1995: top 15 Internet companies worth $17 billion. 2015: top 15 Internet companies worth $2.4 trillion.
- @BenedictEvans: The move to mobile took away Facebook's monopoly of social, but gave it much greater scale, engagement & revenue potential.
- Sundar Pichai: We will move from mobile first to an AI first world.
- Chris Sacca~ We [Google] literally could feel a scale that had never been felt before on the planet. We had a globe where you could visualize in searches in real-time. A dot would indicate every single search on the planet. In the middle of the night there would be a search in the Gobi desert.
- @stack72: Just had a recruiter contact me about a role with "microservices on a servers architecture” - twice I’ve seen that now in 2 days #TheFuture?
- Jason Waxman [Intel]: We see that the world is moving to scale computing in data centers. Our projection is between 70 and 80 percent of the compute, network, and storage will be going into what we call scale data centers by 2025.
- @BenedictEvans: In 2009 only half of Facebook's MAUs were on it every day. Mobile has taken that to 2/3, at much greater scale.
- Dan Rayburn: Amazon and Google Enticing Customers With Cheap Storage, But Beware Of Egress Charges
- @manumarchal: CERN LHC computing challenge is more than 400k CPUs + 300PB of data. It's is also global distribution. #dotScale
- @bridgetkromhout: Decouple and segregate systems requiring different trust levels for faster iteration. @adrianco #craftconf
- @dkalintsev: GE on stage at AWS Summit: “50% TCO saving compared to best what we could do in-house”
- @etherealmind: How messed up was GE management to let their costs get this out of control ?
- @Ellen_Friedman: #dotscale Oliver Keeble CERN - superb: computing is key. Collisions are transient; data is persisted at huge scale
- @stratechery: Aggregation Theory leads to monopoly; expect more antitrust cases, but only in Europe
- @kelseyhightower: Moving to microservices won't save you. Borrowing money in smaller chunks doesn't change the fact that you're broke.
- @jrauser: 1/ Inspired by this HN comment …, I offer a story about software rewrites and Bezos as a technical leader.
- aytekin: This is a story that has happened over and over again. When you rewrite software, you lose all those hundreds of tiny things which were added for really good reasons. Don't do it blindly.
- @BWJones: The F-35 program, which at $1.5 T would fund the entire NIH biomedical research portfolio for 41 years.
- @balinski: "Centralization is a disease" #dotScale #scalability #cloudcomputing
- Tony Bain: So despite the noise surrounding NoSQL, in a head to head comparison of volume of use, NoSQL use seems so very small. At a guess, I would predict that for every NoSQL database in existence there would be at least 1000 relational databases. Probably more. You would be forgiven for thinking NoSQL use was almost insignificant.
- @jaksprats: NVM is gonna put big data on a single machine, very interesting for non-BulkSynchronousParallel GraphDBs like Neo4j
- @frontofstore: US department stores' sales per sq ft down 26% in last ten years - many closures forecast, anchors killing malls.
- simon rothman: If the cost of customer acquisition (CAC) is greater than the lifetime value (LTV), a startup can grow itself to death.
- Steve Loughran: Metrics-first testing, then, is instrumenting the code and publishing it for assertions in unit tests, and for downstream test suites.
- @CompSciFact: Computational Utopianism: Everything would be great if the world would simply rewrite all its software my way.
- @BenedictEvans: Playing with a large data set: ~50% of VC investments generate <1x return, and 5% produce 60% of total market returns.
- Rikard Pavelic: TLDR; FlatBuffers is not the cure for performance issues in Java
- @ryan_sb: The central fallacy of public vs. private cloud is the idea huge companies can standardize totally on either.
- The Serengeti Rules: And the most critical thing we have learned about human life at the molecular level is that everything is regulated. Diseases, it turns out, are mostly abnormalities of regulation, where too little or too much of something is made.
- @caitie: "The more successful your company gets the more the operational cost trumps every other development cost" @mipsytipsy #CraftConf
- @itarchitectkev: Last time #OpenStack held a Summit was here there were 75 attendees. Today there are 7,500! #WeAreOpenStack
- Umer Mansoor: You should almost never, ever rewrite from scratch. We rewrote for all the wrong reasons. While parts of code were bad, we could have easily fixed them with refactoring if we had taken time to read and understand the source code that was written by other people. We had genuine concerns about the scalability and performance of the architecture to support more sophisticated business logic, but we could have introduced these changes incrementally.
- eva1984: we are using Redshift to encode 1 billions rows, and the simple change to let the table sorted by user_id reduce the whole table size by 50%, that is half a TB of disk storage, the improvement is nothing more but jaw-dropping.

If you thought HyperCard was a trip you were correct. Bill Atkinson in a fascinating two part Triangulation interview (1, 2) shared that HyperCard was inspired by a LSD trip. It's a far ranging interview that covers Steve Jobs, why the movies about Jobs sucked, Apple's early days, the web's HyperCard inspiration, photography, spirituality, color theory, philosophy, learning, and lots more.

In case you were wondering (I certainly was): Pied Piper compression (Silicon Valley HBO). This is the Pied Piper code shown on Silicon Valley HBO Season 3 Episode 1. Worth a deca-unicorn or two.

Design Details has a fun podcast with Facebookers talking about Facebook bots and the Facebook design process in general. 124: Dazzle (feat. Jeremy Goldberg). Are bots useful? (yes, but not a convincing argument). Do we have to be nice to bots? (to a point because you never know who if you are talking to a person). Bots aren't all automated, they can be a combination of automated and human interactions. Bots should use strategies to help convince people they are talking with another human, like playing with typing indicator delays to simulate typing. Same for simulating reading. Animation and delays should speed up over time. Regressive design, the idea that over time parts of the UI remove themselves as users use the application more. Fight for designs you believe in. Understand, identify, execute. Truly understand what you are doing at a deep level. Identify the things you can be the most impactful on. Facebook measures you on impact. Lots of talk about design crits and pillars and pillar centered design crits. We often think of ourselves as problem solvers, our job isn't so much problem solving as communicating proposed solutions to problems.

You might enjoy Craft Conf 2016 notes from Bence Nagy.

An Uber trip starts every second in London. How Uber conquered London. They targeted the high end of the market. Guaranteed money to luxury drivers. Created a frictionless system that drivers liked. Then moved into the low end with UberX. Earnings dropped to about £7 an hour as more drivers competed for cheaper fares.

Excellent profile of Claude Shannon: Tinkerer, Prankster, and Father of Information Theory.

An incredible article. When people pooh pooh companies like Google wanting to build cities, keep this in mind. Since cities are the engines of innovation here's a way to go about creating that engine. Instrumental City: The View from Hudson Yards, circa 2019: Over the next decade, the $20-billion project — spanning seven blocks from 30th to 34th Street, between 10th and 12th Avenues — will add 17 million square feet of commercial, residential, and civic space...It’s also rising on a bed of data. It will be the nation’s first “quantified community,” a “fully instrumented” testing ground for applied urban data science...Yet another mechanical loop — a pneumatic-tube trash removal system by the Swedish company Envac — will have separate circuits for recyclables, food waste (converted to fertilizer), and trash (fed into a central dehydrator)...The master plan also calls for a contextual intelligence that acknowledges Hudson Yards’s relation to the city.

Videos from Twitter's #compute event are now available.

Tackling a 1 Billion Member Social Network – Fast Search on a Large Graph: Dealing with datasets of this size is achievable given tools available today. However doing so reliably and successfully requires some research and planning. As among a plethora of possibilities, only a few have what it takes to perform at this scale.

If you want to improve database performance then this is nicely done: What PostgreSQL Tells You About Its Performance. The following are the factors that we need to focus on to judge how well a database cluster is performing: Index usage; IO; Concurrent connections; Deadlocks. Next it talks about Collecting General Performance Data and Monitoring Query Performance.

The most innovative countries scored using 79 indicators: Switzerland, Britain, Sweden, the Netherlands and America lead the pack.

Just say yes. Opinion: Share Data for All Diseases: Traditionally, many in the biopharma industry have been fearful of open data. Patents rule in this highly competitive marketplace, and being first to market a drug or medical device can bring shareholders huge financial rewards. Fortunately, there is new recognition of the essential role of open data to advance biomedical research, particularly in context of public health emergencies.

Signal v. Noise details their use of Feature Flags with code examples. Simple enough.

Utility AI definitely looks simpler. Are Behavior Trees a Thing of the Past?: Behavior Trees are still popular in game development, but are increasingly showing their age...The utility system works by identifying options available to the AI and selecting the best option by scoring each option based on the circumstances. This has proven a remarkable well-working method for several reasons...Simple to Design...Easily Extendable...Better Quality...The figure below shows how the AI can be implemented using utility based methods. Each action is evaluated separately, and the highest scoring action is chosen. Move to enemy is evaluated based on one point per distance to the enemy in e.g. meters.

Google, Uber, Lyft join automakers in self-driving car lobby. Can't wait until lobbys are formed by self-lobbying AIs.

Almost seems like an AI in charge of dynamically creating optimal storage layouts. Inside Capacitor, BigQuery’s next-generation columnar storage format: Capacitor — the storage format in BigQuery...Capacitor builds an approximation model that takes into account all relevant factors and comes up with a reasonable solution...While every column is being encoded, Capacitor and BigQuery collect various statistics about the data — these statistics are persisted and later are used during query execution...Once all column data is encoded, it's written to Google’s distributed file system — Colossus...Queries that will read lots of columns may benefit from smaller shards, but queries that read only few might be better of with larger shards...When data is sent to Colossus for permanent storage, all of it is encrypted...Colossus is a reliable and fault tolerant file system, leveraging techniques such as Reed Solomon codes to ensure that data is never lost...BigQuery starts the geo-replication process, mirroring all the data into different data centers around the specified jurisdiction...BigQuery has background processes that constantly look at all the stored data and check if it can be optimized even further...once the new, optimized storage is complete, it atomically replaces old storage data.

The Galton Board is an awesome way to demonstrate how a normal distribution is created.

Dang, that's cheap. A Billion Taxi Rides on Google's BigQuery: In this post I'll take a look at Google Cloud's BigQuery and see how fast it can query the metadata of 1.1 billion taxi trips...I was absolutely blown away by how fast these queries executed...I'm using 104 GB of Standard Cloud Storage to hold the gzip files which will cost $0.09 / day...On BigQuery the data is uncompressed and takes up about 500 GB of space so I'll have to pay $0.42 / day for that...BigQuery charges for each query as well which should come out to $0.07 for each one run on this dataset.

Here are 400+ Free Resources for DevOps & SysAdmins.

Creating test work loads is a PITA. Todd Lipcon came up with a good strategy: Benchmarking and Improving Kudu Insert Performance with YCSB. Todd takes you through the process he used to poke and prod and tweak the system into performing much better under load. Key is understanding how the system really works underneath.

Great explanation of Flame Graphs: A flame graph visualizes a collection of stack traces (aka call stacks), shown as an adjacency diagram with an inverted icicle layout.7 Flame graphs are commonly used to visualize CPU profiler output, where stack traces are collected using sampling.

Backblaze does it again. Storage Pod 6.0: Building a 60 Drive 480TB Storage Server: deploys 60 off-the-shelf hard drives in a 4U chassis to lower the cost of our latest data storage server to just $0.036/GB. That’s 22 percent less than our Storage Pod 5.0 storage server that used 45 drives to store data for $0.044/GB.

Jake Archibald shares his Caching best practices & max-age gotchas: Getting caching right yields huge performance benefits, saves bandwidth, and reduces server costs, but many sites half-arse their caching, creating race conditions resulting in interdependent resources getting out of sync. Pattern 1: Immutable content + long max-age. Pattern 2: Mutable content, always server-revalidated.

How to edge test a website? elohir: It depends - on your rate of change, your architecture, your escape risk (financial and reputational), the available system knowledge/information, time, resources, experience, existing UT/IT test coverage, user knowledge/monitoring...Realistically, if you have a very large complex webapp with no test coverage you may well have to eat the tech debt you've accrued.

Good list of Programming blogs every programmer must read. Ok, well, that you might want to read. Possibly.

Practical embeddable intelligence doesn't require a long trip through evolution. Intelligent? Brainless slime can 'learn': Tantalizing results suggest that the hallmarks for learning can occur at the level of single cells...Our results point to the diversity of organisms lacking neurons which likely display a hitherto unrecognized capacity for learning.

If your algorithms don't measure real metrics they become hackable. The con is always in the blinds pots. SCAMAZON – Amazon “Kindle Unlimited” Scammers Netting Millions: Amazon changed that payment method from “per borrow” to “pages read.” Not pages written, mind you – but how many pages a reader actually reads. Except, the problem with this method that’s recently come, shockingly, to light, is that there’s a loophole in the system. Apparently, if you put a link at the beginning of your book to the very back and a reader clicks it – the author is paid for all those pages. A full read. Even though a reader just skipped over them.

Nikola Tesla was just playing the long game. Slow Electricity: The Return of DC Power?: Recently, two converging factors have renewed interest in DC power distribution. First, we now have better alternatives for decentralized power generation, the most significant of these being solar PV panels...Secondly, a growing share of our electrical appliances operate internally on DC power...a higher energy efficiency translates into lower capital costs.

Seems most developers like it. Is DigitalOcean good for hosting sites for my clients?

Say goodbye to thumb prints. System Can ID You by Your Brainwaves With 100 Percent Accuracy.

A New Number Format for Computers Could Nuke Approximation Errors for Good~ John Gustafson, a computer scientist specializing high-performance computing, has proposed a new solution to this seemingly unavoidable source of error (read: imprecision). He calls the new format "unum," for universal number...The key difference is that a unum allows for the various "fields" within a binary floating-point number representation to expand and contract according to required precision.

Synchronization primitives in the Linux kernel. Part 3: This is the end of the third part of the synchronization primitives chapter in the Linux kernel. In the two previous parts we already met the first synchronization primitive spinlock provided by the Linux kernel which is implemented as ticket spinlock and used for a very short time locks. In this part we saw yet another synchronization primitive - semaphore which is used for long time locks as it leads to context switch.

Very indepth. Mobile TCP optimization - lessons learned in production. Some lessons: don't rely on hardware features; two mobile networks are never equal; mobile networks should have no reordering; one network can regularly lose some or all packets at start of connection; bad or conflicting middleboxes; O&M is a lot of work.

Lots of interesting papers are available from EuroSys 2016.

Oodle - a library of data compression tools specifically designed for games. Kraken has a decode speed 3X faster than Zlib, and 10-20X faster than LZMA - way faster than anything else at its compression level.

flickr/yakbak: Record and playback HTTP responses.

How the machine ‘thinks’: Understanding opacity in machine learning algorithms: I draw a distinction between three forms of opacity: (1) opacity as intentional corporate or state secrecy, (2) opacity as technical illiteracy, and (3) an opacity that arises from the characteristics of machine learning algorithms and the scale required to apply them usefully.

Quantum Bitcoin: An Anonymous and Distributed Currency Secured by the No-Cloning Theorem of Quantum Mechanics: We show that our construction of quantum shards and two blockchains allows untrusted peers to mint quantum money without risking the integrity of the currency.

Increasing Large-Scale Data Center Capacity by Statistical Power Control: We have implemented and deployed Ampere in our production data center. Controlled experiments on 400+ servers show that by adding 17% servers, we can increase the throughput of the data center by 15%, leading to significant cost savings while bringing no disturbances to the job performance

Stuff The Internet Says On Scalability For April 29th, 2016

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale