advertise
Monday
Nov072016

The QuickBooks Platform

This is a guest post by Siddharth Ram – Chief Architect, Small Business. Siddharth_ram@intuit.com.

The QuickBooks ecosystem is the largest small business SaaS product. The QuickBooks Platform supports bookkeeping, payroll and payment solutions for small businesses, their customers and accountants worldwide. Since QuickBooks is also a compliance & tax filing platform, consistency in reporting is extremely important.. Financial reporting requires flexibility in queries – a given report may have dozens of different dimensions that can be tweaked. Collaboration requires multiple edits by employees, Accountants and Business owners at the same time, leading to potential conflicts. All this leads to solving interesting scaling problems at Intuit.

Solving for scalability requires thinking on multiple time horizons and axes. Scaling is not just about scaling software – it is also about people scalability, process scalability and culture scalability. All these axes are actively worked on at Intuit. Our goal with employees is to create an atmosphere that allows them to do the best work of their lives.

Background

Click to read more ...

Thursday
Oct202016

Future Tidal Wave of Mobile Video

In this article I will examine the growing trends of Internet Mobile video and how consumer behaviour is rapidly adopting to a world of ‘always on content’ and discuss the impact on the underlying infrastructure.

Click to read more ...

Wednesday
Oct192016

Gone Fishin'

Well, not exactly Fishin', but I'll be on a month long vacation starting today. I won't be posting (much) new content, so we'll all have a break. Disappointing, I know. Please use this time for quiet contemplation and other inappropriate activities. See you on down the road...

Monday
Oct172016

Datanet: a New CRDT Database that Let's You Do Bad Bad Things to Distributed Data

 

We've had databases targeting consistency. These are your typical RDBMSs. We've had databases targeting availability. These are your typical NoSQL databases.

If you're using your CAP decoder ring you know what's next...what databases do we have that target making concurrency a first class feature? That promise to thrive and continue to function when network partitions occur?

No many, but we have a brand new concurrency oriented database: Datanet - a P2P replication system that utilizes CRDT algorithms to allow multiple concurrent actors to modify data and then automatically & sensibly resolve modification conflicts.

Datanet is the creation of Russell Sullivan. Russell spent over three years hidden away in his mad scientist layer researching, thinking, coding, refining, and testing Datanet. You may remember Russell. He has been involved with several articles on HighScalability and he wrote AlchemyDB, a NoSQL database, which was acquired by Aerospike.

So Russell has a feel for what's next. When he built AlchemyDB he was way ahead of the pack and now he thinks practical, programmer friendly CRDTs are what's next. Why?

Concurrency and data locality. To quote Russell:

Datanet lets you ship data to the spot where the action is happening. When the action happens it is processed locally, your system's reactivity is insanely quick. This is pretty much the opposite of the non-concurrent case where you need to go to a specific machine in the cloud to modify a piece of data regardless of where the action takes place. As your system grows, the concurrent approach is superior.

We have been slowly moving away from transactions towards NoSQL for reasons of scalability, availability, robustness, etc. Datanet continues this evolution by taking the next step and moving towards extreme distribution: supporting tons of concurrent writers.

The shift is to more distribution in computation. We went from one app-server & one DB to app-server-clusters and clustered-DBs, to geographically distributed data-centers, and now we are going much further with Datanet, data is distributed anywhere you need it to a local cache that functions as a database master.

How does Datanet work?

In Datanet, the same piece of data can simultaneously exist as a write-able entity in many many places in the stack. Datanet is a different way of looking at data: Datanet more closely resembles an internet routing protocol than a traditional client-server database ... and this mirrors the current realities that data is much more in flight than it used to be.

What bad bad things can you do to your distributed data? Here's an amazing video of how Datanet recovers quickly, predictably, and automatically from Chaos Monkey level extinction events. It's pretty slick. 

 

Here's an email interview I did with Russell. He goes into a lot more detail about Datanet and what it's all about. I think you will find it interesting. 

Let's start with your name and a little of your background?

Click to read more ...

Friday
Oct142016

Stuff The Internet Says On Scalability For October 14th, 2016

Hey, it's HighScalability time:

 

A pattern from the collective unconscious of the universe. Scott Kelly's brilliant Year in Space Photos.

 

If you like this sort of Stuff then please support me on Patreon.

  • $1.5 million: new iOS hack bug bounty; 120 Terabits per second: Google and Facebook's submarine cable between Los Angeles with Hong Kong; 142,000: IT jobs lost last month;  $17 billion: cost of recall to Samsung; $4.1 Billion: IRS detected identity theft tax fraud; 1956: first mention of P vs NP by Kurt Gödel to John von Neumann; 1 million HTTP requests per second: DDoS attacks coming from IoT cameras; 90 petaflops: capacity of volunteer computing; 500 msec: time it takes the brain to integrate all sensory data into consciousness;

  • Quotable Quotes:
    • @GreatDismal: Silicon Valley fantasy that our universe is a simulation is actually the fantasy that our universe is a *sucessful startup*
    • @gblache: Being POTUS must be like inheriting a 240 year old code base and being asked to fix it in 4 years while half your team tries to sandbag you.
    • chrissnell: I'm a huge believer in colocation/on-prem in the post-Kubernetes era. I manage technical operations at a SaaS company and we migrated out of public cloud and into our own private, dedicated gear almost two years ago. Kubernetes and--especially--CoreOS has been a game changer for us.
    • @BenedictEvans: You spend 50-100x more on your smartphone than Google or FB make from you in ad revenue. They pay for their clouds out of that ad revenue
    • @kevinmarks: #NextEconomy Urs Hölzle: training a large model is super computationally intensive - trillions of flops
    • Tim O'Reilly: we see huge amounts of capital sitting on the sidelines rather than being part of a city - how do we fix this?
    • old-gregg: When I was at Rackspace, I was trying to analyze the top reasons our startup customers would stop using some of our SaaS offerings. The most common one, unsurprisingly, was they'd run out of business. But another top one was "they got successful". As they got bigger and more successful (can't mention names) they'd bring more and more in-house, eventually getting to a point that the only products they were interested in were just servers and bandwidth.
    • Joel Spolsky: But developers don’t want to overhear conversations. That’s ideal for a trading floor, but developers need to concentrate
    • Werner Vogels: Fast Data is an emerging industry term for information that is arriving at high volume and incredible rates, faster than traditional databases can manage. 
    • mattmanser: Honestly mate, you're just talking about the same old, same old. Every framework is about componentization and encapsulation. You could take React out of your post and replace it with any framework name in the last 40 years and it would have made 'sense' at the time.
    • @danielbryantuk: "Traditional software dev was like farming. You bought your tool stack and got busy. Now we're more like foragers" @monkchips #jaxlondon
    • Prashant Deva: RethinkDB is a classic story of good engineers doing only 'cool' things, not understanding their business, and ignoring all the 'boring' things that actually make a business tick.
    • Ada Lovelace Day: Lovelace came up with a method for the Analytical Engine to repeat a series of instructions: the first documented loop in computing
    • Greg Sanders: Let's stop talking about the block size. Let's talk about weight, the weight of a transaction, the weight of a block, the externalities it puts on the system. Let's talk about throughput. We can put more information in small spaces, so let's look at these problems
    • James Ryan: A major hold-up has been memory issues. GTA can’t even keep a car in memory after it’s left the player’s field of view, so there’s been no room at all for maintaining something resembling a character’s inner world.
    • yummyfajitas: Paraphrasing this to data science: "Everybody wants to have software provide them insights from data, but no one wants to learn any math."
    • @hunterwalk: "YouTube has a 46% share [of online video market], MySpace has 23% & Google Video has 10%." @nytimes 10/9/06  Happy 10th anniversary YT acq
    • @datawireio: "Microservices should not be used if the organization isn't embracing DevOps principles" http://d6e.co/2dxp0vr  by @danielbryantuk
    • delinka: I'm a bit older than the author. Every time I feel like I'm "out of touch" with the hip new thing, I take a weekend to look into it. I tend to discover that the core principles are the same, this time someone has added another C to MVC; or the put their spin on an API for doing X; or you can tell they didn't learn from the previous solution and this new one misses the mark, but it'll be three years before anyone notices (because those with experience probably aren't touching it yet, and those without experience will discover the shortcomings in time.)
    • sonnytron: But that's never good enough for douche bags that have a Foosball table in the office. They want you to give up your lunch and your evenings and play foosball with them. And crush it bro. And kill it bro.
    • @tupshin: @cmeik at scale (for various axes of scale, such as geographic-induced latency) a totally ordered system is impractical due to ux concerns
    • Victor J. Blue: When we’re addicted to online life, every moment is fun and diverting, but the whole thing is profoundly unsatisfying.
    • Richard Evans: I looked through the code and it turned out that much much earlier in the game I’d been rude to a servant during dinner, and the servant had gone into the kitchen and told the people there what a jerk I’d been – one of those people was the doctor. He remembered that. This took me quite a long time to debug. This is an example of how emergence is exciting but it opens up questions about game design.

  • This is the old: We had a post about whether you need maths to program. My answer: You need this kind [discrete math]. This is the new: Foundations of Data Science: we have written this book to cover the theory likely to be useful in the next 40 years, just as an understanding of automata theory, algorithms and related topics gave students an advantage in the last 40 years. One of the major changes is the switch from discrete mathematics to more of an emphasis on probability, statistics, and numerical methods.

  • Unlocking Horizontal Scalability in Our Web Serving Tier. Using MySQL on AWS RDS, Airbnb ran into C10K problems (connection limitations) that manifested as query latency increases, increased requests queues, and error rate spikes. So they added a connection pooling feature to MaxScale, a database proxy that supports intelligent query routing in between client applications and a set of backend MySQL servers. To neutralize the extra network hop introduced by the proxy they implemented availability zone aware request routing in SmartStack. Result: we were able to scale the application server tier with the addition of more servers without an increase in MySQL server threads. More than 15 Airbnb MaxScale database proxy services are in production.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Oct122016

Lessons Learned from Scaling Uber to 2000 Engineers, 1000 Services, and 8000 Git repositories

For a visual of the growth Uber is experiencing take a look at the first few seconds of the above video. It will start in the right place. It's from an amazing talk given by Matt Ranney, Chief Systems Architect at Uber and Co-founder of Voxer: What I Wish I Had Known Before Scaling Uber to 1000 Services (slides).

It shows a ceaseless, rhythmic, undulating traffic grid of growth occurring in a few Chinese cities. This same pattern of explosive growth is happening in cities all over the world. In fact, Uber is now in 400 cities and 70 countries. They have over 6000 employees, 2000 of whom are engineers. Only a year and half a go there were just 200 engineers. Those engineers have produced over 1000 microservices which are stored in over 8000 git repositories.

That's crazy 10x growth in a crazy short period of time. Who has experienced that? Not many. And as you might expect that sort of unique, compressed, fast paced, high stakes experience has to teach you something new, something deeper than you understood before.

Matt is not new to this game. He was co-founder of Voxer, which experienced its own rapid growth, but this is different. You can tell while watching the video Matt is trying to come to terms with what they've accomplished.

Matt is a thoughtful guy and that comes through. In a recent interview he says:

And a lot of architecture talks at QCon and other events left me feeling inadequate; like other people- like Google for example - had it all figured out but not me.

This talk is Matt stepping outside of the maelstrom for a bit, trying to make sense of an experience, trying to figure it all out. And he succeeds. Wildly.

It's part wisdom talk and part confessional. "Lots of mistakes have been made along the way," Matt says, and those are where the lessons come from.

The scaffolding of the talk hangs on WIWIK (What I Wish I Had Known) device, which has become something of an Internet meme. It's advice he would give his naive, one and half year younger self, though of course, like all of us, he certainly would not listen.  

And he would not be alone. Lots of people have been critical of Uber (HackerNewsReddit). After all, those numbers are really crazy. Two thousand engineers? Eight thousand repositories? One thousand services? Something must be seriously wrong, isn't it?

Maybe. Matt is surprisingly non-judgemental about the whole thing. His mode of inquiry is more questioning and searching than finding absolutes. He himself seems bemused over the number of repositories, but he gives the pros and cons of more repositories versus having fewer repositories, without saying which is better, because given Uber's circumstances: how do you define better?

Uber is engaged in a pitched world-wide battle to build a planetary scale system capable of capturing a winner-takes-all market. That's the business model. Be the last service standing. What does better mean in that context?  

Winner-takes-all means you have to grow fast. You could go slow and appear more ordered, but if you go too slow you’ll lose. So you balance on the edge of chaos and dip your toes, or perhaps your whole body, into chaos, because that’s how you’ll scale to become the dominant world wide service. This isn’t a slow growth path. This a knock the gate down and take everything strategy. Think you could do better? Really?

Microservices are a perfect fit for what Uber is trying to accomplish. Plug your ears, but it's a Conway's Law thing, you get so many services because that's the only way so many people can be hired and become productive.

There's no technical reason for so many services. There's no technical reason for so many repositories. This is all about people. mranney sums it up nicely:

Scaling the traffic is not the issue. Scaling the team and the product feature release rate is the primary driver.

A consistent theme of the talk is this or that is great, but there are tradeoffs, often surprising tradeoffs that you really only experience at scale. Which leads to two of the biggest ideas I took from the talk:

  • Microservices are a way of replacing human communication with API coordination. Rather than people talking and dealing with team politics it's easier for teams to simply write new code. It reminds me of a book I read long ago, don't remember the name, where people lived inside a Dyson Sphere and because there was so much space and so much free energy available within the sphere that when any group had a conflict with another group they could just splinter off and settle into a new part of the sphere. Is this better? I don't know, but it does let a lot of work get done in parallel while avoiding lots of people overhead. 
  • Pure carrots, no sticks. This is a deep point about the role of command and control is such a large diverse group. You'll be tempted to mandate policy. Thou shalt log this way, for example. If you don't there will be consequences. That's the stick. Matt says don't do that. Use carrots instead. Any time the sticks come out it's bad. So no mandates. The way you want to handle it is provide tools that are so obvious and easy to use that people wouldn’t do it any other way.

This is one of those talks you have to really watch to understand because a lot is being communicated along dimensions other than text. Though of course I still encourage you to read my gloss of the talk :-)

Stats (April 2016)

Click to read more ...

Tuesday
Oct112016

Sponsored Post: ScaleArc, Aerospike, Scalyr, Gusto, VividCortex, MemSQL, InMemory.Net, Zohocorp

Who's Hiring?

  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.

Fun and Informative Events

  • Learn how Nielsen Marketing Cloud (NMC) leverages online machine learning and predictive personalization to drive its success in a live webinar on Tuesday, September 20 at 11 am PT / 2 pm ET. Hear from Nielsen’s Kevin Lyons, Senior VP of Data Science and Digital Technology, and Brent Keator, VP of Infrastructure, as well as from Brian Bulkowski, CTO and Co-Founder at Aerospike, as they describe the front-edge architecture and technical choices – including the Aerospike NoSQL database – that have led to NMC’s success. RSVP: https://goo.gl/xDQcu4

Cool Products and Services

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Click to read more ...

Wednesday
Oct052016

Stuff The Internet Says On Scalability For October 7th, 2016

Hey, it's HighScalability time:

 

The worlds oldest analog computer, from 87 BC, the otherworldly Antikythera mechanism.

 

If you like this sort of Stuff then please support me on Patreon.

  • 70 billion: facts in Google's knowledge graph; 80 million: monthly visitors to walmart.com; 50%: lower cost for sending a container from Shanghai to Europe; 6 billion: Docker Hub pulls per 6 weeks; 5x: impact reduction using new airbag helmet; 400: node Cassandra + Spark Cluster in Azure; 66%: loss of installs when apps > 100MB; 223GB: Udacity open sources self-driving car data; 

  • Quotable Quotes:
    • rfrey: The success of many companies, and probably all of the unicorns, has nothing to do with technology. The tech is necessary, of course, but so are desks and an accounting department. Internalizing that has been difficult for me as an engineer.
    • @mza: 72 new features/services released last month on #AWS. 706 so far this year (up 42.9% YoY).
    • Marc Andreessen: To me the problem is clear: The problem is insufficient technological adoption, innovation, and disruption in these high-escalating price sectors of the economy. My thesis is that we're not in a tech bubble — we’re in a tech bust. Our problem isn't too much technology or people being too excited about technology. The problem is we don't have nearly enough technology. These cartel-like legacy industries are way too hard to disrupt.
    • @mfdii: What did the NSA agent say when it got access to all the email? Yahoo!
    • Ben Thompson~ [Google's Pixel event] was a huge event, you rarely see a company changing business models
    • @kerryb: News just in: databases to be “named and shamed” if they use foreign keys without trying to train local British keys first.
    • kazagistar: The biggest use of REST in our system (and I suspect a lot of large newer systems) is not "web client to backend server" but "microservice to microservice". And for this, GraphQL is severely immature.
    • @amcafee: Tesla software update: good braking "even if a UFO were to land on the freeway in zero visibility conditions."
    • evanelias: Facebook uses MySQL for countless other critical OLTP use-cases, and (for better or worse) even a few OLAP use-cases. It's the primary store of Facebook, across the entire company. It's the storage layer for ad serving, payments, async task persistence, internal tooling, many many other things. Most of these use-cases make full use of SQL and the relational model.
    • @rakamaric: Deschutes Brewery using light-weight formal methods (white-box fuzzing) to find bugs in their code! #soarlab
    • @tottinge: "Crowdsourcing is the tyranny of the herd, not the wisdom of crowds" @snowded #lascot16
    • @pedrolopesme: @toddlmontgomery "Your API is a protocol. Treat it like one."  #qconnyc 2016
    • Rodrick Brown: A pattern today many use to accomplish this [logging] is using a kafka logging library that hooks into their microservice and use something like spark to consume the logs from Kafka into elasticsearch. We're doing hundreds of thousands of events/sec on a tiny ~8 node ES cluster.
    • @dominicad: "The way people make decisions is key to understanding company culture. Instead of system analysis, record decisions." @snowded #lascot16
    • Hugh E. Williams: Engineers irrationally avoid hash tables because of the worst-case O(n) search time. In practice, that means they’re worried that everything they search for will hash to the same value
    • @JoeEmison: That's just not accurate. I've spent the last year trying to run on GCP and keep going back to AWS. It's not just perception.
    • boulos: Where I do agree is networking egress. The big three providers all have metered bandwidth rates that are way above the "all inclusive" fee you pay to Hetzner, OVH, DO, and others. The cheapest way to host an ftp server that serves 20 TB per month is certainly on one of these (today). None of these providers will let you serve 1 PB / month this way, but if you're in their sweet spot and they can make it work out on average, it's a good fit.
    • @DDDBE: "If you have a magical genie, you still have the problem of trying to explain what you want. That is domain complexity." @malk_zameth
    • avitzurel: The networking on AWS needs to be better. I don't want the strongest machine just to have a better transfer rate. It makes complete sense to have a micro machine for some services, but if those services are accessed or access other HTTP/s services, it will be unnecessarily slow
    • Alan Huang: the number of [Internet] hops can be reduced by 2X by converting the network into a toroid. The number of hops can be further reduced by recasting the network into N-dimensional hypercube or into a multistage network, such as a Perfect Shuffle or Banyan.
    • @jessfraz: Can we go back to ncurses apps instead of these memory hogging bullshits?
    • Russ White: The reality is we shouldn’t need DevOps for configuration at all. This is a bit of a revolution in my thinking in the last two or three years, but what I’m trying to do is to simply make DevOps, as it’s currently constituted, obsolete. DevOps should be about understanding how the network is working and making the network work better

  • Software is eating the world, but software is also eating software. Laugh. Cry. Shake your head and then your fist, but it's a satire that's all true: How it feels to learn JavaScript in 2016. Epic. Once you wipe away the tears you may also realize this is a great tutorial on all the different frameworks and how they fit together. You won't find better. 

  • Videos are available for Full Stack Fest 2016, held in Barcelon, with topics ranging from Docker, IPFS & GraphQL to Reactive Programming, Immutable Interfaces & Virtual Reality. 

  • Great analogy by paulddraper on cloud pricing: "Restaurant prices are ridiculous ... made the comparison between groceries and menu offerings of McDonalds, Taco Bell, Burger King ... Olive Garden (SO EXPENSIVE) and you pay 5 times at a restaurant for the same." You're not paying for hardware. You're paying for hardware, expertise, services, and convenience. On-prem or colocation may be a good choice. But limiting your comparison to raw computing power mischaracterizes the decision.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Friday
Sep302016

Stuff The Internet Says On Scalability For September 30th, 2016

Hey, it's HighScalability time:

 

Everything is a network. Map showing the global genetic interaction network of a cell. 

 

If you like this sort of Stuff then please support me on Patreon.

  • 18: Google can now drink and drive in Washington DC.; $10 billion: cost of a Vision Quest to Mars; 620 Gbps: DDoS attack on KrebsOnSecurity; 1 Tbps: DDoS attack on OVH; $200,000: cost of a typical cyber incident; 8 million: video training dataset labeled with 4800 labels; 180: Amazon warehouses in the US; 10: bits of info per photon; 16: GPUs in new AI killer P2 instance type;

  • Quotable Quotes:
    • @markmccaughrean: 1,000,000 people to Mars in 100 yrs. 10 people/launch? That's 3 a day, every day, for a century. 1% failure rate? One explosion every month
    • @jeremiahg: Any sufficiently advanced exploit is indistinguishable from a 400lb hacker.
    • BrianKrebs: I suggested to Mr. Wright perhaps a better comparison was that ne’er-do-wells now have a virtually limitless supply of Stormtrooper clones that can be conscripted into an attack at a moment’s notice.
    • Sonia: Academia’s not-so-subtle distain for applied research does more than damage a few promising careers; it renders our field’s output useless, destined to collect dust on the shelves of Elsevier. 
    • Monica L. Smith: Nobody builds their own infrastructure. You don’t build your own highway, train line, water pipe, your own sewer. Those are things that connect you and your household to everybody else sequentially in your neighborhood, in your region, from the city out into the broader hinterlands.
    • @olesovhcom: This botnet with 145607 cameras/dvr (1-30Mbps per IP) is able to send >1.5Tbps DDoS. Type: tcp/ack, tcp/ack+psh, tcp/syn.
    • kenrose: We see this pattern at PagerDuty over the majority of our customers. There is a definite lull in alert volume over the weekends that picks up first thing Monday morning.It's led to my personal conclusion that most production issues are caused by people, not errant hardware or systems.
    • @rseroter: "We Crammed this Monolith Into a Container and Called it a Microservice"
    • @mweagle: I really don’t want to run my own k8s in AWS, but ECS is so opaque to debug that k8s seems like a good choice.
    • Werner Vogels~ We have this overarching goal which is customer centricity. Doing anything that benefits the customer gets priority above everything else. Working on eliminating all single points of failure in the company purely benefits the customer because it really improves the customer experience.
    • Cory Doctorow~ The thing open source software had going for it was the Ulysses Pact...the  irrevocable license, the failure mode of open source software, having founded an open source software company, I can tell you there are moments where it feels like your survival turns on being able to close the code you had opened when you were idealistic. There are moments of desperation when that happens. 
    • @lightbend: "We've been using #Akka in production for over two years, without a single crash." -@CruiseNorwegian |
    • @cloud_opinion: Monolithic -> Microservices -> "which container image?" -> "Screw it, lets do PaaS" ->  CF  or AWS?
    • Etsy: concurrency proved to be great for logical aggregation of components, and not so great for performance optimization. Better database access would be better for that.
    • Yaniv Nizan: the number of users actually contributing ad revenue in your app is a lot lower than 6.5% and much closer to the 1% or 2% that contribute revenue from In-app purchases. 
    • @reckless: Elon is basically putting on an Apple event, for going to Mars.
    • @potch: DRY: Don't Repeat Yourself / DAMP: Do Abstraction/Minimalism Pragmatically / MOIST: Maybe Only Innovate Some Times?
    • @dannysullivan: In the Facebook video metrics thing, spare a thought for the poor BuzzFeed watermelon, less viral than it thought :)
    • Addison Snell: If the promise of cloud computing is overblown, it because of the amplification it gets from its loyal converts, enterprises who have found liberation and agility in outsourcing IT. 
    • @psaffo: In 1990, the size of the US software industry was $3.2 billion -- the same size as the gourmet popcorn industry in that same year.
    • David Rosenthal: [Storage] Revenues are flat or decreasing, profits are decreasing for both companies. These do not look like companies faced by insatiable demand for their products; they look like mature companies facing increasing difficulty in scaling their technology.
    • @legind: Let's Encrypt now the 3rd largest CA, after Comodo and Symantec, comprising over 13% of the SSL cert market share 
    • @stewartbrand: “In the long run, the technology driving activities in space will be biological.” Rousing essay by Freeman Dyson.
    • @jessitron: Constructing causal ordering at the generic level of "all messages received cause all future messages sent" is expensive and also less meaningful than a business-logic-aware, conscious causal ordering. This conscious causal ordering gives us external consistency, accurate legibility, and visibility into what we know to be causal.

  • In an article light on details, written more with a marketing flourish, we still learn some interesting details on the infrastructure behind Pokemon Go. Bringing Pokémon GO to life on Google Cloud. It runs on Google Cloud, Kubernetes, Google Container Engine, HTTP/S Load Balancer, and Cloud Datastore. Keep in mind Alphabet is invested in Niantic and Ingress, the forerunner of Pokemon Go, ran on App Engine. So it sounds like a new backend implementation that had to scale from zero to the size of Twitter in a matter of weeks, with a much more complicated work load. Growth was explosive. Player traffic was 50x larger than initial estimates. An implication is the problems experienced during launch were not infrastructure related. Google, in the form of Customer Reliability Engineer (CRE), worked closely with Niantic to make sure the infrastructure scaled. The problems must have been elsewhere in the application stack, which is perfectly understandable. That sort of load could not have been predicted. The design decisions you make for 5x expected traffic are very different than they are for 50x. Nobody will spend the money or take the time to build a system for 50x. Nobody. Lots of good comments on HackerNews. Good question by ksec, would Poekemon Go even be possible in a pre-cloud era? 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Sep282016

How Uber Manages a Million Writes Per Second Using Mesos and Cassandra Across Multiple Datacenters 

If you are Uber and you need to store the location data that is sent out every 30 seconds by both driver and rider apps, what do you do? That’s a lot of real-time data that needs to be used in real-time.

Uber’s solution is comprehensive. They built their own system that runs Cassandra on top of Mesos. It’s all explained in a good talk by Abhishek Verma, Software Engineer at Uber: Cassandra on Mesos Across Multiple Datacenters at Uber (slides).

Is this something you should do too? That’s an interesting thought that comes to mind when listening to Abhishek’s talk.

Developers have a lot of difficult choices to make these days. Should we go all in on the cloud? Which one? Isn’t it too expensive? Do we worry about lock-in? Or should we try to have it both ways and craft brew a hybrid architecture? Or should we just do it all ourselves for fear of being cloud shamed by our board for not reaching 50 percent gross margins?

Uber decided to build their own. Or rather they decided to weld together their own system by fusing together two very capable open source components. What was needed was a way to make Cassandra and Mesos work together, and that’s what Uber built.

For Uber the decision is not all that hard. They are very well financed and have access to the top talent and resources needed to create, maintain, and update these kind of complex systems.

Since Uber’s goal is for transportation to have 99.99% availability for everyone, everywhere, it really makes sense to want to be able to control your costs as you scale to infinity and beyond.

But as you listen to the talk you realize the staggering effort that goes into making these kind of systems. Is this really something your average shop can do? No, not really. Keep this in mind if you are one of those cloud deniers who want everyone to build all their own code on top of the barest of bare metals.

Trading money for time is often a good deal. Trading money for skill is often absolutely necessary.

Given Uber’s goal of reliability, where out of 10,000 requests only one can fail, they need to run out of multiple datacenters. Since Cassandra is proven to handle huge loads and works across datacenters, it makes sense as the database choice.  

And if you want to make transportation reliable for everyone, everywhere, you need to use your resources efficiently. That’s the idea behind using a datacenter OS like Mesos. By statistically multiplexing services on the same machines you need 30% fewer machines, which saves money. Mesos was chosen because at the time Mesos was the only product proven to work with cluster sizes of 10s of thousands of machines, which was an Uber requirement. Uber does things in the large.

What were some of the more interesting findings?

  • You can run stateful services in containers. Uber found there was hardly any difference, 5-10% overhead, between running Cassandra on bare metal versus running Cassandra in a container managed by Mesos.

  • Performance is good: mean read latency: 13 ms and write latency: 25 ms, and P99s look good.

  • For their largest clusters they are able to support more than a million writes/sec and ~100k reads/sec.

  • Agility is more important than performance. With this kind of architecture what Uber gets is agility. It’s very easy to create and run workloads across clusters.

Here’s my gloss of the talk:

In the Beginning

Click to read more ...

Page 1 ... 3 4 5 6 7 ... 210 Next 10 Entries »