« How we implemented the video player in Mail.Ru Cloud | Main | What does Etsy's architecture look like today? »

Stuff The Internet Says On Scalability For March 25th, 2016

Did you know there's a field called computational aesthetics? Neither did I. It's cool though.


If you like this sort of Stuff then please consider offering your support on Patreon.

  • 51%: of billion-dollar startups founded by immigrants; 2.8 billion: Twitter metric ingestion service writes per minute; 1 billion: Urban Airship push notifications a day; 1.5 billion: Slack messages sent per month; 35 million: server nodes in the world; 10: more regions will be added to Google Cloud;  697 million: WeChat active monthly users; 

  • Quotable Quotes:
    • Dark Territory: When officials in the Air Force or the NSA neglected to let Microsoft (or Cisco, Google, Intel, or any number of other firms) know about vulnerabilities in its software, when they left a hole unplugged so they could exploit the vulnerability in a Russian, Chinese, Iranian, or some other adversary’s computer system, they also left American citizens open to the same exploitations—whether by wayward intelligence agencies or by cyber criminals, foreign spies, or terrorists who happened to learn about the unplugged hole, too. 
    • @xaprb: If you adopt a microservices architecture with 1000x more things to monitor, you should not expect your monitoring cost to stay the same.
    • The Swrve Monetization Report 2016: almost half of all the revenue generated in mobile gaming comes from just 0.19 percent of users.
    • Nassim Taleb: Now some empiricism. Consider that almost all tech companies "in the tails" were not started by "funding". Take companies you are familiar with: Microsoft, Apple, Google, Facebook. These companies started with risk-taking. Funding came in small amounts, way later.
    • @leegomes: In a big shift, Google says a go-anywhere self-driving car might not be ready for 30 years.
    • Google’s Eric Schmidt: Machine learning will be basis of ‘every huge IPO’ in five years.
    • @brendangregg: "Memory bandwidth is the number one issue we see today" Denis at Facebook
    • @ogrisel: PostgreSQL 9.6 will support parallel aggregation! TPC-H Q1 @ 100GB benchmark shows linear scaling up to 30 workers 
    • @sarah_edo: The hardest part of being a developer isn't the code, it's learning that the entire internet is put together with peanut butter and goblins.
    • @beaucronin: "Cryptocurrencies are an emergent property of the Internet – almost a fifth protocol"
    • Thomas Frey: We are moving toward an era of megaprojects. We’ll finish the Pan-American Highway with a 25-mile bridge over the Darien Gap in Panama. 
    • @samphippen: “Do you expect me to talk?” “No Mr. Bond, I expect you to be willing to relocate to san francisco"
    • @brendanbaker: Outside of the core people, who actually know what they're doing, AI is talked about like gamification was three years ago.
    • @RichRogersHDS: Did you know? The collective noun for a group of programmers is a merge-conflict." - @omervk
    • @jbeda: This is how you know Google is serious about cloud. Real money on real facilities. 
    • Farhad Manjoo: The lesson so far in the on-demand world is that Uber is the exception, not the norm. Uber, but for Uber — and not much else.
    • @DKThomp: Airbnb woulda made a killing in 1900: One third of urban families used to make 10%+ of their income from "lodgers" 
    • @AstroKatie: "We can make 'smart drones'!" "Your chatbot became a Nazi in like a day." "OK good point."
    • @adrianco: I agree GCP are setup for next gen apps, think they are missing out on where most of the $ are being spent in the short term.
    • @EdwardTufte: Like book publishers and Silicon Valley, the further the distance from content production, the greater the money. 
    • Biz Carson: Slack grew from 80 to 385 employees in 14 months
    • Chip Overclock®: One of those things is being evidence-based. Don't guess. Test. Measure. Look and see. Ask. If you can avoid guessing, do so.

  • Impressive demo of the new smaller, less dorky looking Meta augmented reality headset. Here's a hands on report. The development kit is $949. This most likely will be the new app store level opportunity so it might be smart to get on it now. The Gold Rush phase is still in the future. The uses are obvious to anyone who reads Science Fiction. This is a TED talk, so of course no details on performance, etc. What are the backend infrastructure opportunities? Hopefully they'll keep all that open instead of building another walled garden.

  • Is artificial intelligence ready to rule the world? IMHO: No. You would need a large training set. The problem is we have so few good examples of ruling the world successfully. You could create an artificial world in VR with a simulated world to generate training data, but that's just another spin on in the long history of Utopian thinking. We should probably learn to govern ourselves first before we pitch it over to an AI.

  • "It's better to have a media strategy than a security strategy." That's Greg Ferro commenting in an episode of Network Break on Home Depot's paltry $19.5 million fine for their massive 2014 data breach. Why pay for security when there's no downside? It's not like people stopped shopping at Home Depot. 

  • Martin Thompson: I've worked on risk systems and with good design you can often consolidate everything onto a single server. If you shard your model by trader then it should scale well on high core count servers. Some things to consider: -  - Go for 2 or 4 socket servers and use the QPI socket interconnects as the best network available. - Build multiple independent models that have zero write contention otherwise this can limit throughput. - Consider building parts of the model in native languages such as C or C++ for memory efficiency when dealing with large datasets. C/C++ also provides better access to vectorisation for large scale calculations.

  • Inside Project McQueen, Apple’s plan to build its own cloud: Apple has been working on “Project McQueen,” a plan to become more reliant on its own data center infrastructure and reduce its dependence on public clouds...Apple isn’t happy with the fact AWS is not able to very quickly load photos and videos onto users’ iOS devices...Azure won’t be able to handle the growth of Apple’s workloads in the future, meaning Apple would have to pay much more in order to help Microsoft cover the cost of expanding Azure’s data center infrastructure.

  • There's a huge difference in the scheduling policies needed for a simple request-response server versus one that executes complex work patterns. One size does not fit all. The Way of the Gopher: Our Node service may have handled incoming requests like champ if all it needed to do was return immediately available data. But instead it was waiting on a ton of nested callbacks all dependent on responses from S3 (which can be god awful slow at times)...Two weeks later, after my initial crash course introduction to Golang, we had a brand new Octo service up and running...With our Golang upgrade, we are easily able to handle 200 requests per minute and 1.5 million S3 item fetches per day. And those 4 load-balanced instances we were running Octo on initially? We’re now doing it with 2.

  • Almost everyone is doing the API economy wrong asks the right ecosystem level question: who’s doing a great job with API programs that respect developer economics?: Uber and Lyft, for example, will both pay the developer a bounty for every user a developer signs up, but neither service will share any revenue from the rides booked...Uber’s new Trip Experiences API is a much better option because it lets developers pursue in-ride and post-ride monetization opportunities... Walgreen’s successful Photo Prints API gives developers a way to monetize their apps by earning commissions on each photo printed to a local Walgreens store location...Slack, which is spawning an entire ecosystem of chat bots.

  • Excellent details on how Twitter handles their massive monitoring infrastructure. Observability at Twitter: technical overview, part I and part II: Today, our time series metric ingestion service handles more than 2.8 billion write requests per minute, stores 4.5 petabytes of time series data, and handles 25,000 query requests per minute.

  • Cool story on how CERN wanting to survey their tunnels and vaults with small, precise drones lead to the creation of a new lightweight, inexpensive range sensor. CeBIT 2016: Terabee’s Range Sensor Helps Make Drones Fast, Cheap, and Under Control. Problems are what drives the Great Going Forward.

  • Interesting to see what the response has been to Google's violent nudge strategy. Google’s Mobilegeddon Aftermath: Eight Months Into A Better Mobile Web. 25% Of Websites Without Any Previous Mobile Strategy Have Made the Switch; 85% used responsive web design to go mobile; e-commerce site owners were the quickest to adapt; conversions were up 27% compared to 2013. 

  • Is this murder? Microsoft terminates its Tay AI chatbot after she turns into a Nazi. Not yet.

  • Nice tutorial combined with Lessons from Building a Node App in Docker

  • Joaquin Quiñonero Candela: In computer vision we [Facebook] have a system that processes every single image and video uploaded to Facebook, totaling well over 1B items per day. We predict the content of an image for example in order to generate captions for the blind, or to automatically detect and take down offensive content, improve media search results, automate visual captcha among many other use cases. We use deep convolutional networks with billions of parameters.

  • If you need to add search to your product here's a good way to think about the process. How We Built Search at Kit: Cloudsearch is very attractive, especially given our very small team right now, because it is easy to set up and comes with many tools out of the box...Elasticsearch requires a little more work to setup...Ultimately I felt that the advanced features and customizability offered by Elasticsearch made it worth the extra investment...One of the big advantages you have with Elasticsearch over Cloudsearch is you can actually run it locally instead of relying on connectivity to the AWS.

  • GitHub doesn't show search field unless you sign in. While it's understandable to find this strategy annoying, it is a valid tactic for controlling general backend load. As gracenotes points out from Reddit: Lessons Learned From Mistakes Made Scaling To 1 Billion Pageviews A Month: "Treat nonlogged in users as second class citizens. By always giving logged out always cached content Akamai bears the brunt for reddit’s traffic. Huge performance improvement"

  • It's a jungle out there. If you are alone on the Internet bad things can happen, like DDoS attacks which cause your VPS account to be terminated. Digital Ocean and High Traffic. The server was compromised and used to attack other servers. This parasite wasn't very smart though. If it would have constrained it's IO to reasonable levels the host would have never noticed.

  • Netflix makes all those show posters automagically. They decide on the most important subject automatically, they crop it automatically, and decide where to put the text automatically. Here's how they do it: Extracting image metadata at scale. As contrast to the edge detection approach, here's Improving YouTube video thumbnails with deep neural nets.

  • Terrorists hide in the open, no encryption necessary. A View of ISIS’s Evolution in New Details of Paris Attacks: Investigators found crates’ worth of disposable cellphones...The attackers seized cellphones from the hostages and tried to use them to get onto the Internet...But the three teams in Paris were comparatively disciplined. They used only new phones that they would then discard, including several activated minutes before the attacks, or phones seized from their victims...According to the police report and interviews with officials, none of the attackers’ emails or other electronic communications have been found.

  • What are some of the best practices for managing environments, network segmentation, and security automation in AWS? AWS NETWORKING, ENVIRONMENTS AND YOU dives and does a great job of answering the question. AWS is a complex beast that adds new features hourly, so we are all probably in this boat: "I learned that I and lots of other people still believe things about AWS that aren’t actually true anymore." The conclusion: VPC is the future and it is awesome, and unless you have some VERY SPECIFIC AND CONVINCING reasons to do otherwise, you should be spinning up a VPC per environment with orchestration and prob doing it from CI on every code commit, almost like it’s just like, you know, code. 

  • Micro-services for performance. The keys to achieving low latencies are: low latency infrastructure for messaging and logging. Ideally around a 1 micro-second for short messages; a minimum of network hops; a high level of reproduce-ability of real production load so you can study the 99%tile (worst 1 %) or 99.9%tile (worst 0.1%) latencies; viewing each CPU core as having a specific task/service.

  • A good discussion of Need advice on picking a VPS hosting company. Options: OVH, DigitalOcean, Linode, Vultr, AWS, InMotion, NFO Servers, RamNode, atlantic.net. Vultr: great performance and value. OVH: built-in DDoS mitigation, no transfer limit, reliability issues. Linode: a little more expensive, great provider. InMotion: very good pricing. RamNode: high quality, great support. DigitalOcean: best one for me, seems expensive.

  • Eventual consistency may be loosing a little of it's luster. First Google, now Twitter has added Strong consistency in Manhattan, their distributed storage system. With eventual consistency there are many nice programmer friendly guarantees you can't make: Key uniqueness guarantees, Check-and-set, All-or-nothing updates, Read-your-write consistency (causality). Twitter went with an approach where the consistency model is configurable per dataset so tradeoffs can be selected for a particular use case. All strongly consistent operations go through a per-shard log. A typical system has tens of thousands of shards and a large number of logs. The service only takes a few minutes to provision and some properties already using it are: URL shortener, authentication service and profile service. 

  • This is great, Parse is releasing features now that only work on the  open source Parse Server. Parse Server Goes Realtime with Live Queries. If you strike me down, I shall become more powerful than you can possibly imagine. 

  • Here’s Everything Apple Announced Today. No new laptops. If you are Apple, flush with cash, yoking your release schedule to Intel's ability to produce new chips may be a good reason to DIY. But maybe they are already doing too much?

  • Auto-scaling and self-defensive services in Golang. The story of how a simple single queue processing worker evolves into an autoscaling beast, whith specific golang peculiarities in mind. Attaching multiple consumers to the queue didn't work well because goroutines can't be killed externally when they become unresponsive. Next option was to spawn multiple processes. A master process starts by spinning up a goroutine that regularly determines the number of processes that should be running to handle the load. There are code examples and a discussion of how to deal with typical issues like interprocess communication, death detection, graceful shutdown, and other common problems. 

  • A Computer With a Great Eye Is About to Transform Botany: software to do what the human eye cannot: identify families of leaves, in mere milliseconds. The software, which Wilf and his colleagues describe in detail in a recent issue of Proceedings of the National Academy of Sciences, combines computer vision and machine learning algorithms to identify patterns in leaves, linking them to families of leaves they potentially evolved from with 72 percent accuracy. In doing so, Wilf has designed a user-friendly solution to a once-laborious aspect of paleobotany. The program, he says, “is going to really change how we understand plant evolution.”

  • Down and dirty. Web Page Performance Death by a Thousand Tiny Cuts. Page Performance cuts: not using image sprites, not using a cookieless domain for images and other static content, not using a CDN, blocking javascript - worse if it’s in the head, not minifying your javascript and CSS, not optimizing caching policies, not gzipping your content. There more good JavaScript micro optimization advice as well.

  • Non-Volatile Memory is its own thing and designs will have to figure out to make use of its strengths. Integrating 3D Xpoint with DRAM: systems won’t be able to advantage of NVM technology until their DRAM and disk I/O stacks are re-engineered for the specific advantages and quirks of NVM...the big win for new NVM technologies will be as an adjunct to DRAM, rather than as flash SSD replacements.

  • If learning to code were like learning to write…we’d start with words, first teaching children what a token is and how to read them. Ths is a thoughtful comparison between the process of learning writing versus coding. One standout difference (to me) is that individual words mean something in a human language. In programming only sentences and paragraphs of sentences have meaning. That's a hard jump to make.

  • Perhaps reducing products to have the simplest possible interface is the wrong way to go? Who are they actually good for? When U.S. air force discovered the flaw of averages: Using the size data he had gathered from 4,063 pilots, Daniels calculated the average of the 10 physical dimensions...the consensus among his fellow air force researchers was that the vast majority of pilots would be within the average range on most dimensions...out of 4,063 pilots, not a single airman fit within the average range on all 10 dimensions. Simpson's paradoxSelection bias and bombers

  • Datacenters To Get A High Fiber Bandwidth Diet: So Facebook has worked with the industry to shorten the cable links to 500 meters or less in length, put on cheaper cable coatings that are fine in the datacenter even if they do shorten the lifespan a bit (theoretically), and at the same time forced cable makers to lower the temperature range of the CWDM4 transceiver so they didn’t overheat the datacenter. With these changes, the single mode fiber is suitable for connecting rows of infrastructure or different rooms inside datacenters together.

  • amark/gun: GUN is a realtime, distributed, offline-first, graph database engine. Lightweight and powerful, at just ~9KB gzipped.

  • OpenResty:  a full-fledged web platform by integrating the standard Nginx core, LuaJIT, many carefully written Lua libraries, lots of high quality 3rd-party Nginx modules, and most of their external dependencies. It is designed to help developers easily build scalable web applications, web services, and dynamic web gateways.

  • tensorflow.github.io/serving: a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. 

  • grpc: A high performance, open source, general RPC framework that puts mobile and HTTP/2 first.
  • DalmatinerDB:  a no fluff purpose built metric database. Not a layer put on top of a general purpose database or datastore.

  • Efficient Queue Management for Cluster Scheduling: to the best of our knowledge, this is the first work to provide principled solutions to the above problems by introducing queue management techniques, such as appropriate queue sizing, prioritization of task execution via queue reordering, starvation freedom, and careful placement of tasks to queues. We instantiate our techniques by extending both a centralized (YARN) and a distributed (Mercury) scheduler, and evaluate their performance on a wide variety of synthetic and production workloads derived from Microsoft clusters. Our centralized implementation, Yaq-c, achieves 1.7x improvement on median job completion time compared to YARN, and our distributed one, Yaq-d, achieves 9.3x improvement over an implementation of Sparrow’s batch sampling on Mercury.

  • NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories: We present NOVA, a file system designed to maximize performance on hybrid memory systems while providing strong consistency guarantees. NOVA adapts conventional log-structured file system techniques to exploit the fast random access that NVMs provide. In particular, it maintains separate logs for each inode to improve concurrency, and stores file data outside the log to minimize log size and reduce garbage collection costs

Reader Comments (5)

Please stick to high scalability news, and not politics, gossip news, and rumors (ex. anything NSA related). Thank you. Love this weekly summary otherwise.

March 25, 2016 | Unregistered CommenterReader

+ 1: agree with @Reader

March 26, 2016 | Unregistered CommenterAuthor

I, on the contrary, am interested in politics. Who are you to tell the author what should and shouldn't be in his posts ?

Feel free to skip reading a paragraph if you are not comfortable with what you are reading.

March 27, 2016 | Unregistered CommenterJordan

I'm a little confused about how the NSA can be considered political. It's straight up technical from my perspective.

March 28, 2016 | Registered CommenterTodd Hoff

Nice to read again about OpenResty. I'm using it for 2 years now and for small and medium complexity projects it's brilliant. Speed makes writing code effective and Lua make whole process shorter, so it's cheap, fast and easy to learn and maintain.

Great article as always, thank you!

March 28, 2016 | Unregistered CommenterMisiek

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>