advertise
Wednesday
Oct052016

Stuff The Internet Says On Scalability For October 7th, 2016

Hey, it's HighScalability time:

 

The worlds oldest analog computer, from 87 BC, the otherworldly Antikythera mechanism.

 

If you like this sort of Stuff then please support me on Patreon.

  • 70 billion: facts in Google's knowledge graph; 80 million: monthly visitors to walmart.com; 50%: lower cost for sending a container from Shanghai to Europe; 6 billion: Docker Hub pulls per 6 weeks; 5x: impact reduction using new airbag helmet; 400: node Cassandra + Spark Cluster in Azure; 66%: loss of installs when apps > 100MB; 223GB: Udacity open sources self-driving car data; 

  • Quotable Quotes:
    • rfrey: The success of many companies, and probably all of the unicorns, has nothing to do with technology. The tech is necessary, of course, but so are desks and an accounting department. Internalizing that has been difficult for me as an engineer.
    • @mza: 72 new features/services released last month on #AWS. 706 so far this year (up 42.9% YoY).
    • Marc Andreessen: To me the problem is clear: The problem is insufficient technological adoption, innovation, and disruption in these high-escalating price sectors of the economy. My thesis is that we're not in a tech bubble — we’re in a tech bust. Our problem isn't too much technology or people being too excited about technology. The problem is we don't have nearly enough technology. These cartel-like legacy industries are way too hard to disrupt.
    • @mfdii: What did the NSA agent say when it got access to all the email? Yahoo!
    • Ben Thompson~ [Google's Pixel event] was a huge event, you rarely see a company changing business models
    • @kerryb: News just in: databases to be “named and shamed” if they use foreign keys without trying to train local British keys first.
    • kazagistar: The biggest use of REST in our system (and I suspect a lot of large newer systems) is not "web client to backend server" but "microservice to microservice". And for this, GraphQL is severely immature.
    • @amcafee: Tesla software update: good braking "even if a UFO were to land on the freeway in zero visibility conditions."
    • evanelias: Facebook uses MySQL for countless other critical OLTP use-cases, and (for better or worse) even a few OLAP use-cases. It's the primary store of Facebook, across the entire company. It's the storage layer for ad serving, payments, async task persistence, internal tooling, many many other things. Most of these use-cases make full use of SQL and the relational model.
    • @rakamaric: Deschutes Brewery using light-weight formal methods (white-box fuzzing) to find bugs in their code! #soarlab
    • @tottinge: "Crowdsourcing is the tyranny of the herd, not the wisdom of crowds" @snowded #lascot16
    • @pedrolopesme: @toddlmontgomery "Your API is a protocol. Treat it like one."  #qconnyc 2016
    • Rodrick Brown: A pattern today many use to accomplish this [logging] is using a kafka logging library that hooks into their microservice and use something like spark to consume the logs from Kafka into elasticsearch. We're doing hundreds of thousands of events/sec on a tiny ~8 node ES cluster.
    • @dominicad: "The way people make decisions is key to understanding company culture. Instead of system analysis, record decisions." @snowded #lascot16
    • Hugh E. Williams: Engineers irrationally avoid hash tables because of the worst-case O(n) search time. In practice, that means they’re worried that everything they search for will hash to the same value
    • @JoeEmison: That's just not accurate. I've spent the last year trying to run on GCP and keep going back to AWS. It's not just perception.
    • boulos: Where I do agree is networking egress. The big three providers all have metered bandwidth rates that are way above the "all inclusive" fee you pay to Hetzner, OVH, DO, and others. The cheapest way to host an ftp server that serves 20 TB per month is certainly on one of these (today). None of these providers will let you serve 1 PB / month this way, but if you're in their sweet spot and they can make it work out on average, it's a good fit.
    • @DDDBE: "If you have a magical genie, you still have the problem of trying to explain what you want. That is domain complexity." @malk_zameth
    • avitzurel: The networking on AWS needs to be better. I don't want the strongest machine just to have a better transfer rate. It makes complete sense to have a micro machine for some services, but if those services are accessed or access other HTTP/s services, it will be unnecessarily slow
    • Alan Huang: the number of [Internet] hops can be reduced by 2X by converting the network into a toroid. The number of hops can be further reduced by recasting the network into N-dimensional hypercube or into a multistage network, such as a Perfect Shuffle or Banyan.
    • @jessfraz: Can we go back to ncurses apps instead of these memory hogging bullshits?
    • Russ White: The reality is we shouldn’t need DevOps for configuration at all. This is a bit of a revolution in my thinking in the last two or three years, but what I’m trying to do is to simply make DevOps, as it’s currently constituted, obsolete. DevOps should be about understanding how the network is working and making the network work better

  • Software is eating the world, but software is also eating software. Laugh. Cry. Shake your head and then your fist, but it's a satire that's all true: How it feels to learn JavaScript in 2016. Epic. Once you wipe away the tears you may also realize this is a great tutorial on all the different frameworks and how they fit together. You won't find better. 

  • Videos are available for Full Stack Fest 2016, held in Barcelon, with topics ranging from Docker, IPFS & GraphQL to Reactive Programming, Immutable Interfaces & Virtual Reality. 

  • Great analogy by paulddraper on cloud pricing: "Restaurant prices are ridiculous ... made the comparison between groceries and menu offerings of McDonalds, Taco Bell, Burger King ... Olive Garden (SO EXPENSIVE) and you pay 5 times at a restaurant for the same." You're not paying for hardware. You're paying for hardware, expertise, services, and convenience. On-prem or colocation may be a good choice. But limiting your comparison to raw computing power mischaracterizes the decision.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Friday
Sep302016

Stuff The Internet Says On Scalability For September 30th, 2016

Hey, it's HighScalability time:

 

Everything is a network. Map showing the global genetic interaction network of a cell. 

 

If you like this sort of Stuff then please support me on Patreon.

  • 18: Google can now drink and drive in Washington DC.; $10 billion: cost of a Vision Quest to Mars; 620 Gbps: DDoS attack on KrebsOnSecurity; 1 Tbps: DDoS attack on OVH; $200,000: cost of a typical cyber incident; 8 million: video training dataset labeled with 4800 labels; 180: Amazon warehouses in the US; 10: bits of info per photon; 16: GPUs in new AI killer P2 instance type;

  • Quotable Quotes:
    • @markmccaughrean: 1,000,000 people to Mars in 100 yrs. 10 people/launch? That's 3 a day, every day, for a century. 1% failure rate? One explosion every month
    • @jeremiahg: Any sufficiently advanced exploit is indistinguishable from a 400lb hacker.
    • BrianKrebs: I suggested to Mr. Wright perhaps a better comparison was that ne’er-do-wells now have a virtually limitless supply of Stormtrooper clones that can be conscripted into an attack at a moment’s notice.
    • Sonia: Academia’s not-so-subtle distain for applied research does more than damage a few promising careers; it renders our field’s output useless, destined to collect dust on the shelves of Elsevier. 
    • Monica L. Smith: Nobody builds their own infrastructure. You don’t build your own highway, train line, water pipe, your own sewer. Those are things that connect you and your household to everybody else sequentially in your neighborhood, in your region, from the city out into the broader hinterlands.
    • @olesovhcom: This botnet with 145607 cameras/dvr (1-30Mbps per IP) is able to send >1.5Tbps DDoS. Type: tcp/ack, tcp/ack+psh, tcp/syn.
    • kenrose: We see this pattern at PagerDuty over the majority of our customers. There is a definite lull in alert volume over the weekends that picks up first thing Monday morning.It's led to my personal conclusion that most production issues are caused by people, not errant hardware or systems.
    • @rseroter: "We Crammed this Monolith Into a Container and Called it a Microservice"
    • @mweagle: I really don’t want to run my own k8s in AWS, but ECS is so opaque to debug that k8s seems like a good choice.
    • Werner Vogels~ We have this overarching goal which is customer centricity. Doing anything that benefits the customer gets priority above everything else. Working on eliminating all single points of failure in the company purely benefits the customer because it really improves the customer experience.
    • Cory Doctorow~ The thing open source software had going for it was the Ulysses Pact...the  irrevocable license, the failure mode of open source software, having founded an open source software company, I can tell you there are moments where it feels like your survival turns on being able to close the code you had opened when you were idealistic. There are moments of desperation when that happens. 
    • @lightbend: "We've been using #Akka in production for over two years, without a single crash." -@CruiseNorwegian |
    • @cloud_opinion: Monolithic -> Microservices -> "which container image?" -> "Screw it, lets do PaaS" ->  CF  or AWS?
    • Etsy: concurrency proved to be great for logical aggregation of components, and not so great for performance optimization. Better database access would be better for that.
    • Yaniv Nizan: the number of users actually contributing ad revenue in your app is a lot lower than 6.5% and much closer to the 1% or 2% that contribute revenue from In-app purchases. 
    • @reckless: Elon is basically putting on an Apple event, for going to Mars.
    • @potch: DRY: Don't Repeat Yourself / DAMP: Do Abstraction/Minimalism Pragmatically / MOIST: Maybe Only Innovate Some Times?
    • @dannysullivan: In the Facebook video metrics thing, spare a thought for the poor BuzzFeed watermelon, less viral than it thought :)
    • Addison Snell: If the promise of cloud computing is overblown, it because of the amplification it gets from its loyal converts, enterprises who have found liberation and agility in outsourcing IT. 
    • @psaffo: In 1990, the size of the US software industry was $3.2 billion -- the same size as the gourmet popcorn industry in that same year.
    • David Rosenthal: [Storage] Revenues are flat or decreasing, profits are decreasing for both companies. These do not look like companies faced by insatiable demand for their products; they look like mature companies facing increasing difficulty in scaling their technology.
    • @legind: Let's Encrypt now the 3rd largest CA, after Comodo and Symantec, comprising over 13% of the SSL cert market share 
    • @stewartbrand: “In the long run, the technology driving activities in space will be biological.” Rousing essay by Freeman Dyson.
    • @jessitron: Constructing causal ordering at the generic level of "all messages received cause all future messages sent" is expensive and also less meaningful than a business-logic-aware, conscious causal ordering. This conscious causal ordering gives us external consistency, accurate legibility, and visibility into what we know to be causal.

  • In an article light on details, written more with a marketing flourish, we still learn some interesting details on the infrastructure behind Pokemon Go. Bringing Pokémon GO to life on Google Cloud. It runs on Google Cloud, Kubernetes, Google Container Engine, HTTP/S Load Balancer, and Cloud Datastore. Keep in mind Alphabet is invested in Niantic and Ingress, the forerunner of Pokemon Go, ran on App Engine. So it sounds like a new backend implementation that had to scale from zero to the size of Twitter in a matter of weeks, with a much more complicated work load. Growth was explosive. Player traffic was 50x larger than initial estimates. An implication is the problems experienced during launch were not infrastructure related. Google, in the form of Customer Reliability Engineer (CRE), worked closely with Niantic to make sure the infrastructure scaled. The problems must have been elsewhere in the application stack, which is perfectly understandable. That sort of load could not have been predicted. The design decisions you make for 5x expected traffic are very different than they are for 50x. Nobody will spend the money or take the time to build a system for 50x. Nobody. Lots of good comments on HackerNews. Good question by ksec, would Poekemon Go even be possible in a pre-cloud era? 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Sep282016

How Uber Manages a Million Writes Per Second Using Mesos and Cassandra Across Multiple Datacenters 

If you are Uber and you need to store the location data that is sent out every 30 seconds by both driver and rider apps, what do you do? That’s a lot of real-time data that needs to be used in real-time.

Uber’s solution is comprehensive. They built their own system that runs Cassandra on top of Mesos. It’s all explained in a good talk by Abhishek Verma, Software Engineer at Uber: Cassandra on Mesos Across Multiple Datacenters at Uber (slides).

Is this something you should do too? That’s an interesting thought that comes to mind when listening to Abhishek’s talk.

Developers have a lot of difficult choices to make these days. Should we go all in on the cloud? Which one? Isn’t it too expensive? Do we worry about lock-in? Or should we try to have it both ways and craft brew a hybrid architecture? Or should we just do it all ourselves for fear of being cloud shamed by our board for not reaching 50 percent gross margins?

Uber decided to build their own. Or rather they decided to weld together their own system by fusing together two very capable open source components. What was needed was a way to make Cassandra and Mesos work together, and that’s what Uber built.

For Uber the decision is not all that hard. They are very well financed and have access to the top talent and resources needed to create, maintain, and update these kind of complex systems.

Since Uber’s goal is for transportation to have 99.99% availability for everyone, everywhere, it really makes sense to want to be able to control your costs as you scale to infinity and beyond.

But as you listen to the talk you realize the staggering effort that goes into making these kind of systems. Is this really something your average shop can do? No, not really. Keep this in mind if you are one of those cloud deniers who want everyone to build all their own code on top of the barest of bare metals.

Trading money for time is often a good deal. Trading money for skill is often absolutely necessary.

Given Uber’s goal of reliability, where out of 10,000 requests only one can fail, they need to run out of multiple datacenters. Since Cassandra is proven to handle huge loads and works across datacenters, it makes sense as the database choice.  

And if you want to make transportation reliable for everyone, everywhere, you need to use your resources efficiently. That’s the idea behind using a datacenter OS like Mesos. By statistically multiplexing services on the same machines you need 30% fewer machines, which saves money. Mesos was chosen because at the time Mesos was the only product proven to work with cluster sizes of 10s of thousands of machines, which was an Uber requirement. Uber does things in the large.

What were some of the more interesting findings?

  • You can run stateful services in containers. Uber found there was hardly any difference, 5-10% overhead, between running Cassandra on bare metal versus running Cassandra in a container managed by Mesos.

  • Performance is good: mean read latency: 13 ms and write latency: 25 ms, and P99s look good.

  • For their largest clusters they are able to support more than a million writes/sec and ~100k reads/sec.

  • Agility is more important than performance. With this kind of architecture what Uber gets is agility. It’s very easy to create and run workloads across clusters.

Here’s my gloss of the talk:

In the Beginning

Click to read more ...

Tuesday
Sep272016

Sponsored Post: ScaleArc, Spotify, Aerospike, Scalyr, Gusto, VividCortex, MemSQL, InMemory.Net, Zohocorp

Who's Hiring?

  • Spotify is looking for individuals passionate in infrastructure to join our Site Reliability Engineering organization. Spotify SREs design, code, and operate tools and systems to reduce the amount of time and effort necessary for our engineers to scale the world’s best music streaming product to 40 million users. We are strong believers in engineering teams taking operational responsibility for their products and work hard to support them in this. We work closely with engineers to advocate sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing production issues. We are looking for an SRE Engineering Manager in NYC and SREs in Boston and NYC.

  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.

Fun and Informative Events

  • Learn how Nielsen Marketing Cloud (NMC) leverages online machine learning and predictive personalization to drive its success in a live webinar on Tuesday, September 20 at 11 am PT / 2 pm ET. Hear from Nielsen’s Kevin Lyons, Senior VP of Data Science and Digital Technology, and Brent Keator, VP of Infrastructure, as well as from Brian Bulkowski, CTO and Co-Founder at Aerospike, as they describe the front-edge architecture and technical choices – including the Aerospike NoSQL database – that have led to NMC’s success. RSVP: https://goo.gl/xDQcu4

Cool Products and Services

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Click to read more ...

Sunday
Sep182016

Stuff The Internet Says On Scalability For September 23rd, 2016

Hey, it's HighScalability time:

 

Will Minority Report for developers really help us program better? (Primitive)

 

If you like this sort of Stuff then please support me on Patreon.

  • October 2017: ICANN changes the DNSSEC root keys; $2.91M: cost of running Let's Encrypt; 20%: Amazon convenience tax; 100%: increase in spam; 6.2 km: Quantum teleportation across a metropolitan fibre network; March 18, 1982: birth of containers; 6 months: how long a lightening bolt can power a 60 watt bulb; trillions: EV cache hits per day @ Netflix; 5x: Spark is faster than MapReduce; billions: HTTP, Git and SSH connections served per day at GitHub; 28: # of websites in North Korea; 

  • Quotable Quotes:
    • @vgcerf: It is time to admit after 18 years that the multistakeholder model of Internet operation works. #yestoIANA
    • @EricLathrop: Netflix found a 5x performance variation between AWS instances at the same price! They benchmark to avoid overpaying. @indirect #Strangeloop
    • @swardley: Perfectly reasonable @NigelBarron. Larry's statements are ludicrous, play is to milk existing customers whilst hoping to find a new future.
    • @BethanyMacri: Etsy is very anti-SOA. Monolith forever!
    • janfoeh: I've said it before here and I'll say it again: the JS ecosystem is moving in the wrong direction. Sometimes I feel that with Javascript, we developers have taken something that wasn't ours, and we're in the process of destroying the best thing there ever was about it. So here we are, the single <script> tag having been replaced with compilers, transpilers, five mutually incompatible build systems, three different module systems in God knows how many implementations, frameworks changing their API every ten minutes and five thousand lines of NPM module code to be installed for even the simplest of tasks.
    • marknadal: This is the way humans have been thinking for thousands of years. And guess what, I sat down with a large airline and had to warn them "we're not Strongly Consistent" and they laughed at me saying "you realize we've been booking seat reservations before there was internet, before you were born, and before there was cheap telephony. Seat reservation has never been strongly consistent - we used to have hundreds of travel agents booking seats and it would take 2 weeks before we would hear about it."
    • Jason Feifer: All I have to do is go to another website and see the price is different, and I don't. It's crazy. Like, why am I not doing that? We're the problem.
    • @cmeik: "The clock-free design paradigm I promote must eventually prevail. It fits Physics."
    • @gabrielgironda: mclaren and apple are a great fit. all the stability of apple's software combined with the reliability of british automobiles
    • Bryan Cantrill: The virtual machine is vestigial abstraction. We can not get to #serverless without getting rid of of the VM.
    • @dchetwynd: The number of US households that only use cellular data has doubled from 10% to 20% between 2013 and 2016 #strangeloop
    • There are even more awesome Quotable Quotes in the full article.

  • Interesting results from a major architecture change at Netflix. Zuul 2 : The Netflix Journey to Asynchronous, Non-Blocking Systems. Netflix had a blocking servlet connectionless based architecture and they moved to a nonblocking asynchronous connection architecture. In general, from a latency, CPU, throughput, and capacity perspective the async version didn't perform much better than the old sync version. Netflix found "the less work a system actually does, the more efficiency we gain from async", which makes sense in terms of scheduling and IO. There was a big win however in the ability to scalably maintain over 83 million persistent connections, one for every client, back into their cloud infrastructure. The cost of a connection becomes a file descriptor instead of a thread, which is a lot cheaper. By using a persistent connect Netlfix can reduce overall device requests, improve device performance, understand and debug the customer experience better, enable more real-time user experience innovations, and reduce overall cloud costs by replacing “chatty” device protocols today (which account for a significant portion of API traffic) with push notifications. Operations did take a hit. Sync systems are much easier to understand and debug. Also, making the migration was not easy. Changing sync code to async is not for the faint-hearted. 

  • This is hilarious. Read the whole thread. You won't be disappointed. @stef: You are in a startup. All around is a burning runway. There are exits to the North and East. You have a bootstrap. There is a VC here.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Friday
Sep162016

Stuff The Internet Says On Scalability For September 16th, 2016 

Hey, it's HighScalability time:

 

The struggle for life that kills. Stunning video of bacteria mutating to defeat antibiotics. 

 

If you like this sort of Stuff then please support me on Patreon.

  • 60%: time spent cleaning dirty dirty BigData; 10 million: that's a lot of Raspberry Pi; 365: days living in a Mars simulation; 100M: monthly League of Legends players; 1.75 billion: copyright takedowns by Google; 3.5 petabytes: data Evernote has to move to Google cloud; 11%: YoY growth in time spent on mobile apps; 4 hours: time between Lambda coldstarts; 

  • Quotable Quotes:
    • Camille Fournier: humans struggle to tangibly understand domains that are theoretically separate when they are presented as colocated by the source code.
    • @songcarver: The better example: iPhone 7 is showing 115% of 2016 Macbook single core performance, 88% of multi-core.
    • ex3ndr: We (actor.im) also moved from google cloud to our servers + k8s. Shared persistent storage is a huge pain. We eventually stopped to try to do this, will try again when PetSets will be in Beta and will be able to update it's images.
    • @mcclure111: "Well maybe you should get your spaceship working before you try to implant nanites in your brain, DUDE"
    • IOpipe: Organizations I’ve spoken to have expressed an average of 10x cost savings over microservices-based infrastructure for the code they’ve moved to AWS Lambda.
    • avitzurel: Kube is winning for the same reason React/Redux (and now Mobx) is winning and why Rails was winning at the time. Community.
    • @etherealmind: Evernote is moving to public cloud. A strong sign that its in financial trouble, or lacking product direction.
    • @codinghorror: In 8 years of colocating servers I have seen multiple spinning rust disks fail, and one PSU, but zero SSDs failed from 2013-on.
    • Caltech: Now, with the new simulation—which used a network of thousands of computers running in parallel for 700,000 central processing unit (CPU) hours—Caltech astronomers have created a galaxy that looks like the one we live in today, with the correct, smaller number of dwarf galaxies.
    • Andy Grove: Rust is gearing up to be particularly suitable for building scalable asynchronous io and getting Rust onto servers is a great way to drive adoption of the language. 
    • James Hamilton: We have long believed that 80% of operations issues originate in design and development… When systems fail, there is a natural tendency to look first to operations since that is where the problem actually took place. Most operations issues, however, either have their genesis in design and development or are best solved there.
    • Google: even the possibility of a future quantum computer is something that we should be thinking about today.
    • Alan Kay: This doesn’t mean that “objects are now hidden”, but that they should be part of the “modeling and designing of ideas and processes” that is the center of what programming needs to be.
    • Packet Pushers: In the future the world be made of clouds and users. The user will be sitting in Starbucks and accessing the cloud and your network will be totally irrelevant.
    • StorageMojo: Our current system for the diffusion of knowledge is breaking down. How are we going to fix it?
    • Ron Miller: Flywheel Effect is the idea that once you have your core tech pieces in place, they have an energy of their own that drives other positive changes and innovations.
    • stonogo: Intel needs everything to be NUMA-aware. They're betting a lot of money on Xeon Phi, and once the self-booting KNL machines are out nobody will want to deal with the pcie cards any more.
    • @Fruzenshtein: It's strange to listen a talk about microservices when you have already heard about serverless architecture💩
    • MORGAN HOUSEL: There’s often a big gap between changing the world and convincing people that you changed the world.
    • @JoeEmison: Another under-reported aspect of moving from VMware to AWS: almost everyone is getting a massive performance improvement.
    • Dan Rayburn: Twitter’s NFL stream, taking place Thursday Sept 15th, will be delivered by Akamai and Level 3 and I do not expect it to have a large simultaneous audience. My estimate is under 2M simultaneous streams.
    • @johngirvin: Running serverless infrastructure this morning. In that the servers are all down.
    • Vlad Ilyushchenko: QuestdbWorker is pure worker implementation as far as worker consumers don't necessarily process same number of queue items. It is slower due to constant interaction with memory barriers and is at heavy disadvantage in this particular benchmark because it can't benefit from batching. Despite that workers can be useful when queue item processing cost is non-uniform.
    • ArkyBeagle: The path to concurrency is paved with a mix of finite state machines and event-driven programs. IMO, neither FP nor OO have all that much to say about that.
    • matt_oriordan: Having static servers handling load is only part of the problem in our experience. The true complexity and scalability of a system comes when you consider how it copes under load with unexpected failures (network, hardware), but more importantly expected maintenance such as regular deploys, scaling up and scaling down events
    • matthieum: I think my biggest complaint about the try-with-resources pattern is that... it just doesn't work. RAII just works, without effort on the client part, no matter how she uses the class.
    • Brandon Beck: I remember we had something like 20 folding chairs and, without knowing if anyone would watch, decided to stream the games. We ended up getting over 100,000 concurrent viewers, which just blew our minds. It was there we realized this was something League players loved and started to really take it seriously.
    • jandrewrogers: The weakness of GPU databases is that while they have fantastic internal bandwidth, their network to the rest of the hardware in a server system is over PCIe, which generally isn't going to be as good as what a CPU has and databases tend to be bandwidth bound. This is a real bottleneck and trying to work around it makes the entire software stack clunky.
    • @pmarca: 1 Software eats the world, 2 Every company becomes a software company, and 3 Software people run every company:
    • @benalexau: Benchmarked @mjpt777's Aeron w/ SBE and @grpcio for bulk xfers between JVMs. While different sweet spots, Aeron ~200 times higher throughput
    • Freeman Dyson: So, anyway, that’s sort of my view about the brain. That we won’t really understand the brain until we can make models of it which are analog rather than digital, which nobody seems to be trying very much.

  • Drivers and users turn out to be relatively price insensitive to Uber fares. As Uber approaches a monopoly position there's a lot of consumer surplus that can be turned into profits (if the war chest lasts). Why Uber Is an Economist’s Dream: if you extrapolate to the whole U.S., we found that the overall consumer surplus added up to almost $7 billion. So people spent about $4 billion on Ubers, but they actually would have been willing to spend about $11 billion.

  • Making money in Apple's app store ain't what it used to be, at least for developers. Here's a thoughtful discussion on the transition from a charge-up-front model to advertising suppported apps by long time app developers David Smith and Marco Arment: Overcast trying ads, dark theme now free and Under the Radar #45. You may lament advertising as the go to model allowing developers to make a decent living, but it turns out advertising within apps nicely aligns developer incentives with user goals in a way that doesn't happen for content. For content the drive to increase page views encourages a race to the bottom. Click-bait dominates as CPMs tumble. For apps the incentive is to provide a good user experience for every interaction. You want to encourage the user to use your app because that's when you get paid. Individually the payoff isn't so great that it warps the incentives to "encourage" a user to use your app, but over a whole installed base the more users use your app the more you get paid, so as a developer you have an incentive to keep developing features and making nice little improvements to the app. In the charge-up-front model the developer is disincentivized from making changes because there's always a well-founded fear any changes won't be rewarded by increased sales. If efforts aren't rewarded there's no point in efforting...and make no mistake, programming does take a lot of effort. And if a user really doesn't want ads they can pay to have them removed, there are no app ad blockers. Everyone wins. This has been your moment of Zen.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Tuesday
Sep132016

If Traffic is an Iterated Prisoner's Dilemma Game Can Smart Cars Evolve Co-operative Behavior?

 

Can small tribes of cooperating smart cars improve overall traffic even if they are not in the majority? Sure, if every car was a self-driving car maybe traffic jams could dissolve like blood clots on anticoagulants, but what about that messy in-between period? It will be some time before smart cars rule the road. Until then can smart cars make traffic better?

Adoption is hard. This is a general problem in tech. You want people to join your social network yet people won't join until enough people have already joined. What you really want is that virtuous circle to develop, where as more people adopt a technology it causes even more people to adopt it. So startups spend their VC money fast and furiously in hopes of acquiring new customers betting the lifetime value of a customer will be worth the investment. VC money is the dead corpse that feeds the rest of the ecosystem.

Traffic is already an example of a vicious cycle. Horrendous traffic jams are now the norm and "good" traffic windows are just tall tales texted to children. And it keeps on getting worse and not in a worse is better sort of way. Yet the incentives are still not enough for people to self-organize and batch themselves into cars. Cars are more of a synchronous streaming model. Traffic problems will need to be solved at a different level of abstraction. Human drivers are just so hopelessly human.

In some ways traffic is like an iterated game of Prisoner's Dilemma. So in an Evolution of Cooperation sense can overall flows improve if groups of self-driving cars cooperate together within a stream of muggle cars? If smart cars on the road choose to gang up together will that improve commute times in such a way that it will encourage more and more cars to join the gang, becoming part of the solution instead of the problem?

But we have the social network problem. Cars currently are individual, kept in silos organized by manufacturer. Tesla, Uber, Google, etc. don't cooperate at a global traffic planning level. Even cars within a manufacturer don't yet have the ability to slave themselves together in a self-driving conga line of traffic goodness.

Historically we know after individual point solutions are created the next step is to add a scheduling layer. After running a program on an entire CPU we create an OS (Linux, Windows, etc) to run multiple programs on the same CPU. After the container we create an OS (Swarm, Kubernetes, Mesos, etc) to run multiple programs on the same boxes.

We'll need a TrafficOS so all the cars that want to can cooperate together, you know like XMPP before the walls went up. Plus we'll need ecosystem incentives to help drive adoption. 

So many questions. Will drivers volunteer to be part of a smart car peloton even if it means their commute suffers in the short term? What's the tipping point? Will free riders ruin the whole thing? Like the fast lane, should incentives be created to encourage cooperating tribes of smart cars? Should traffic lights favor smart car trains? Should traffic laws allow bullet trains of smart cars to speed down the highway? Should insurance premiums be reduced for time spent protected in smart car convoys? Maybe smart car software should be seeded with altruism "genes" so they cooperate naturally? How can defectors be punished? Maybe we need a reputation system scoring for traffic reciprocity?

Unlike the weather traffic is something we can do something about. Let's just try to do a better job than we did with social networks and IM systems. Traffic is actually important.

Related Articles

Tuesday
Sep132016

Sponsored Post: ScaleArc, Spotify, Aerospike, Scalyr, Gusto, VividCortex, MemSQL, InMemory.Net, Zohocorp

Who's Hiring?

  • Spotify is looking for individuals passionate in infrastructure to join our Site Reliability Engineering organization. Spotify SREs design, code, and operate tools and systems to reduce the amount of time and effort necessary for our engineers to scale the world’s best music streaming product to 40 million users. We are strong believers in engineering teams taking operational responsibility for their products and work hard to support them in this. We work closely with engineers to advocate sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing production issues. We are looking for an SRE Engineering Manager in NYC and SREs in Boston and NYC.

  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.

Fun and Informative Events

  • Learn how Nielsen Marketing Cloud (NMC) leverages online machine learning and predictive personalization to drive its success in a live webinar on Tuesday, September 20 at 11 am PT / 2 pm ET. Hear from Nielsen’s Kevin Lyons, Senior VP of Data Science and Digital Technology, and Brent Keator, VP of Infrastructure, as well as from Brian Bulkowski, CTO and Co-Founder at Aerospike, as they describe the front-edge architecture and technical choices – including the Aerospike NoSQL database – that have led to NMC’s success. RSVP: https://goo.gl/xDQcu4

Cool Products and Services

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Click to read more ...

Tuesday
Sep132016

The Dollar Shave Club Architecture Unilever Bought for $1 Billion

This is a guest post by Jason Bosco, the Dollar Shave Club’s Director of Engineering, Core Platform & Infrastructure, on the infrastructure of its ecommerce technology.

With more than 3 million members, Dollar Shave Club will do over $200 million in revenue this year. Although most are familiar with the company’s marketing, this immense growth in just a few years since launch is largely due to its team of 45 engineers.

Dollar Shave Club engineering by the numbers:

Core Stats

Click to read more ...

Thursday
Sep082016

Stuff The Internet Says On Scalability For September 9th, 2016 

Hey, it's HighScalability time:

 

An alternate universe where Zeppelins rule the sky. 1929. (@AeroDork)

 

If you like this sort of Stuff then please support me on Patreon.
  • 15%: Facebook's reduction in latency using HTTP2's server push; 1.9x: nanotube transistors outperform silicon; 200: projectors used to film a "hologram"; 50%: of people fall for phishing attacks (it's OK to click); 5x: increased engagement using Google's Progressive Web Apps; 115,000+: Cassandra nodes at Apple; $500 million: Pokémon Go; $150M: Delta's cost for datacenter outage; 

  • Quotable Quotes: 
    • Dan Lyons: I wanted to write a book about what it’s like to be 50 and trying to reinvent yourself – that struggle. There are all these books and inspirational speakers talking about being a lifelong learner and it’s so great to reinvent yourself, the brand of you. And I wanted to say, you know, it’s not like that. It’s actually really painful.
    • Engineers & Coffee~ In modern application development everything is a stream now versus historically everything was a transaction. Make a request and the you're done. It's easier to write analytics on top of streams versus using Hive. It's cool that Kinesis is all real-time and has the power of SQL.
    • David Smith: The [iOS] market has been pulling me along towards advertising based apps, and I’ve found that the less I fight back with anachronistic ideas about how software “should” be sold, the more sustainable a business I have.
    • @tef_ebooks: (how do you keep a lisp user in suspense
    • @bodil: Use tests to verify your assumptions. Use a type checker to verify your implementations. Always.
    • tostitos1979: Here is a factoid for the youngins ... the Internet/Arpanet was created BEFORE the first microprocessor! In fact, Intel was originally founded to make RAM ICs. They only later created the first microprocessor (the 4004)!
    • gsubes:  Our tests showed than even with larger messages (100k price ticks per request) pipes were still a magnitude slower [than Memory Mapping].
    • Quincy Larson: Did you know the average developer only get two hours of uninterrupted work done a day? They spend the other 6 hours in varying states of distraction.
    • StorageMojo: Achieving lower-than-DRAM pricing requires volume, and that’s where NRAM has a competitive advantage over, say, 3D XPoint. Processing can be done on today’s flash, DRAM or logic lines. NRAM processing only needs spin coating and patterning – as well as carbon nanotubes – which modern fabs all support.
    • Xiao Mina: We’ve seen this story before: as cost of production and distribution go down, the range of creativity goes up.
    • @clarkkaren: Give humans a system and they'll game it. The End.
    • Jim Starkey: AmorphousDB is my modest effort to question everything database.
      The best way to think about Amorphous is to envision a relational database and mentally erase the boxes around the tables so all records free float in the same space – including data and metadata.
    • @jdub: On Reddit: “What is the use of Elastic IPs, if I can use ELB or an Auto Scaling Group instead?” STUDENT, YOU HAVE ACHIEVED ZEN OF CLOUD.
    • @BenedictEvans: A key premise for the next decade: it's easier for software to enter other industries than for other industries to hire software people
    • @jasongorman: To clarify, "dependency injection" literally just means passing an object's collaborators as constructor/method params. That's all it is.
    • jackpeterfletch: Grand solution to world hunger, available on Kindle!
    • @swardley: Optimise flow.  Often when you examine flows then you’ll find bottlenecks, inefficiencies and profitless flows.  There will be things that you’re doing that you just don’t need to. Be very careful here to consider not only efficiency but effectiveness. 
    • @PatrickMcFadin: #uber is fully replicated and active-active to make sure you never get stranded. #cassandrasummit
    • @FSVO: A monk named Chaitin found an algorithm for expressing the complexity of sutras. His master commented, “This monk could be shorter.”
    • Dotzler: We [Firefox] can learn from the competition [Chrome]. The way they implemented multi-process is RAM-intensive, it can get out of hand. We are learning from them and building an architecture that doesn’t eat all your RAM. 
    • @hichaelmart: Although CPU bound calculations [on OpenWhisk] seem about 4x slower than Lambda, so not too bad. Lambda still the winner so far though.
    • Shel Kaphan: Okay, I’m going to be building this website to run a bookstore [Amazon] and I haven’t done that before but it doesn’t sound so hard. When I’m done with that I’m not sure what I’ll do.
    • sixhobbits: "Our logger failed silently" "Shouldn't that have been recorded somewhere?" "I guess it's turtles all the way down"
    • @xmal: Trying to explain that CRDT causal contexts are a natural evolution of TCP sequence numbering and vector clocks in reliable causal broadcast
    • Joi Ito: Just like it is impossible to make another Silicon Valley somewhere else, although everyone tries—after spending four days in Shenzhen, I’m convinced that it’s impossible to reproduce this ecosystem anywhere else.
    • @adriancolyer: "My claim is that it is possible to write grand programs, noble programs, truly magnificent ones..." Knuth 1974
    • @Excellion: According to legend, if you say Blockchain three times fast, your databases will magically become immutable & your company a fintech leader.
    • bec0: The world has changed. Dennard scaling has mostly been replaced. The economic Moore's Law has morphed. It had too...we have all gotten used to its benefits.
    • @cloud_opinion: 5 stages of Cloud Grief: It's not secure / It's someone's computer / We do private cloud / Hybrid cloud  / Lambda is full of servers anyway
    • @DDD_Borat: "Why you not like framework annotations in your code?" - "Would you put bumper sticker on a Ferrari?" Rofl
    • @robert_winslow: Slow software is your fault. These are the real speed limits: billions of CPU instructions, GBs of RAM access, 100k+ SSD I/Os... per second.
    • Walter Bentley: I am proud to say, OpenStack held up to the torment. Did not experience not one single API request failure throughout my numerous load tests — yet another proof point that OpenStack is ready for enterprise/production use.
    • @xaprb: Let's fork it, say the people who have never put their heart and 5 years of their life into a product only to watch someone else fork it.
    • @adrianco: People asking Docker to slow down is like OpenStack folks asking AWS to standardize and slow down.
    • @amcafee: "In 1974, it was illegal for an airline to charge < $1,442 for a flight between New York City and Los Angeles."
    • Fairly Nerdy: For most real world scenarios, where you are betting against the house which has a house edge, f* becomes negative, which means that you shouldn’t be playing that game.  Truthfully it means that you should take the other side of the wager, become the house, and make them bet against you!
    • Judd Kaiser: Experience shows that good scalability can be achieved on 10 GigE networking provided that you stay above about 50,000 cells per core. That means, for example, that a 20 M cell problem shows good scaling up to about 400 cores; beyond that, interprocess communication latency begins to dominate and scaling degrades.

  • Maybe the real reason Uber wants driverless cars is hiring, er...onboarding drivers from across the globe is a really tough problem to solve. Each location has their own processes and that kills scalability. Screening processes and regulations vary, some countries have a very long list of required documents, and onboarding flows vary. Here's the story: How Uber Engineering Massively Scaled Global Driver Onboarding. So you can't use the same app everywhere. The solution was, as it often is, is to go meta and dynamic: the onboarding state machine (OSM)  easily configure a set of steps for each onboarding process in each country, state, city, or any level of granularity we need, coupled with an event system that allows us to easily switch users from one step to another depending on their actions or input. The onboarding API can then easily query the OSM to know at which step in the process a user is.  Clients are now stateless,  responsible only for their UI, 100% of the business logic in the shared back end. They went from Flask to Tornado and a lighter version of their initial JSON schema architecture, where only data is passed to the client, not UI definitions.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...