« Low Level Scalability Solutions - The Conditioning Collection | Main | It's a VM Wasteland - A Near Optimal Packing of VMs to Machines Reduces TCO by 22% »

Stuff The Internet Says On Scalability For March 8, 2013

Hey, it's HighScalability time: 

  • Quotable Quotes:
    • @ibogost: Disabling features of SimCity due to ineffective central infrastructure is probably the most realistic simulation of the modern city.
    • antirez: The point is simply to show how SSDs can't be considered, currently, as a bit slower version of memory. Their performance characteristics are a lot more about, simply, "faster disks".
    • @jessenoller: I only use JavaScript so I can gain maximum scalability across multiple cores. Also unicorns. Paint thinner gingerbread
    • @liammclennan: high-scalability ruby. Why bother?
    • @scomma: Problem with BitCoin is not scalability, not even usability. It's whether someone will crack the algorithm and render BTC entirely useless.
    • @webclimber: Amazing how often I find myself explaining that scalability is not magical
    • @mvmsan: Flash as Primary Storage - Highest Cost, Lack of HA, scalability and management features #flash #SSD #CIO
    • @pneuman42: Game servers are the *worst* scalability problem. Most services start small and scale up over time, solving problems along the way
    • @jeffsussna: OH: "Amazon outages involve server auto-scaling failures. Microsoft outages involve credit card auto-renewal failures"
    • carlosthecharlie: Writing Map/Reduce jobs is like making debt payments on technical debt you don't yet owe
    • Thomas Fuchs: client-side processing and decoupling is detrimental to both the speed of development, and application performance
    • anonymous: *eye twitches* You maintain secondary indexes in dynamo db fields, managed in application code? Dude. DUDE!
  • LinkedIn: Secrecy Doesn't Scale. Winston Churchill: Truth is so precious that she should always be attended by a bodyguard of lies.

  • So eternal vigilance really can be crowdsourced: Bill Introduced to Re-Legalize Cellphone Unlocking.

  • Engaging discussion with George Dyson: Turing’s Cathedral and the Dawn of the Digital Universe. Template based addressing. DNA is searched by template. You don't have to know the exact location of a protein and the match doesn't have to be exact. Google is template searching for data. He thinks this template idea is a third revolution in computing. Much more flexible and robust. Because of errors you have to build architectures that are more flexible and can deal with ambiguity, which is what nature does. Google as an Oracle Machine. Alan Turing said machines will never be intelligent unless they are allowed to make mistakes. Deterministic computing is limited. A non-deterministic element, an Oracle is required. Machines need to learn by making mistakes, tolerating mistakes, a learning from mistakes. Google is made up deterministic machines. We humans are in Google's loop to act as the non-deterministic signal, as Oracle Machines. 

  • What will we do with millions of 8+ core mobile devices? There's WebRTC, true peer-to-peer in the browser. Take a look at PeerJS and SimpleWebRTC.js. It seems to me we can start to think of a sustainable model of computing using cooperating mobile devices rather than expensive backend servers.

  • Ted Nyman (GitHub) - Scaling Happiness talk: no managers. Culture and freedom perks aren't enough. To change culture you have to change structure. The structure is the absence of structure. Cultural arises naturally. 146 people work in 67 cities for GitHub. Without a mechanism for making decisions many good ideas will never happen at GitHub because consensus can't be reached. That's part of the tradeoff. Teams form naturally based on interests. Implication is you have to hire the right people. Technology creates order. Internally tooling creates structure and grooves process. You can't make anyone do anything. Consistency comes from things people want to use in the form of libraries and tooling. You have to accept mistakes. Autonomy is priceless. No one has quit in 5 years. Policies are set by everyone. Good things do die and that's the price you have to pay.

  • ukd1: Scale all the things: 'Setup a caching layer (Memcached, Redis, Varish)'...if only it was just that easy.Prep for scaling should be something like;1. testing to see when you'll need to 2. monitoring so you know when 3. planning before you reach date / time from #2 how you'll do it 4. implement 5. test you've done #4 properly 6. release 7. repeat the whole process. Work out a methodical way that makes sense. "Setup a caching layer" is not that.

  • Like the parallel in Disfluency where effort acts as a metacognitive alarm when something is more difficult than it should be. Works in programs too.

  • Sustainability in digital systems is interesting to think about...Scaling Up Systems to Make Cities More Sustainable:  scaling up savings into an entire “ecosystem,” so that buildings could leverage each other’s savings, was the way to go; there are no linear patterns for a building’s energy behavior; The interconnectedness of the buildings were more critical than the age or height or other characteristic of a building; if we increase density by 50 percent, we could reduce energy use by 20 percent.

  • Speaking of SimCity, ask them if scaling issues are a good problem to have. DRM related scaling issues at launch are causing a very unsimulated unhappiness for users. They are trying to compensate for scaling issues by adding more servers and removing features, which doesn't seem to be working. 

  • Which brings up Complex systems, oh how might ye fail? Complex systems are intrinsically hazardous, Complex systems are heavily and successfully defended against failure, Catastrophe requires multiple failures – single point failures are not enough, Complex systems contain changing mixtures of failures latent within them, Complex systems run in degraded mode.

  • TechOps Pre-launch Checklist for Web Apps. A handy list of tasks to complete before releasing an application. The major categories are: prepare for disaster, run basic security checks, prep for scaling. 

  • This can't be good - Hard Power Off Is Dead But Not Buried: experimental results reveal that thirteen out of the fifteen tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, meta-data corruption, and total device failure.

  • The great leap forward - How Basecamp Next got to be so damn fast without using much client-side UI. Excellent indepth discussion of some advanced website acceleration techniques: temporarily caches each page you’ve visited and simply asks for a new version in the background when you go back to it, cache TO THE MAX, Thou shall share a cache between pages, Thou shall share a cache between people, use infinite scrolling pages to send smaller chunks.

  • Amir Salihefendic with a good talk about Redis and it can be used to implement queues and other common programming tasks. Interesting is the use of bitmapist for advanced analytics, saving $2K a month.

  • Spent weeks tweaking JVM flags? This will ring so true...JVM performance optimization, Part 5: Is Java scalability an oxymoron?: it's JVM technology that limits the scalability of enterprise Java applications. 

  • There's a new O'Reilly book on Graph Databases. Written by Ian Robinson, Jim Webber, and Emil Eifrém. So far looks really good, as I would expect. And right now it's free!

  • Scaling Node.js Applications using the cluster module to create a network of processes which can share ports on one machine. Node-http-proxy or nginx are used to load balance across machines. Good explanation with example configuration files that are useful beyond node.js.

  • Ooyala on Scaling Cassandra for Big Data. They use it for fast Map/Reduce, Storm, resume enhancement, machine learning, and metrics.

  • High Availability at Braintree. Braintree is a payment gateway so uptime is important. World is divided into planned and unplanned failures. For planned failures: reduce maintenance windows, pre and post migrations, rolling deploys, PostgreSQL for fast DDL, and a proxy later to to pause traffic. Unplanned failures: built their own load balancer, LVS/IPVS, Pacemaker, BigBrother, LitmusPaper, BGP routes traffic through multiple ISPs and data centers, Pingdom to monitor, connect, connect to many networks, ISP outages usually partial, use processor proxies so they can load balance over these proxies and route around ISP connection issues, Mallory, let the system heal and retry, automate everything. Nice presentation.

  • Solving Problems Using Data Structures: data structure can be used to encapsulate implementation details and lead to nice clean code...the main motivation for object oriented code in the first place is encapsulation.

  • Details of the February 22nd 2013 Windows Azure Storage Disruption. Lesson: keep those those security certificates updated. It's always something. Also, Azure overtakes Amazon's cloud in performance test  

  • The cycle continues. Amazon is reducing DynamoDB pricing 85%. Release innovative product. Gain some experience. Drive down costs through economies of scale, hardware improvements, and algorithm improvements.  

Reader Comments (7)

The other problem with Java scalability is that it's in memory representation is space inefficient to the point that when you are scaling servers up to hundreds of gigabytes it pays to store the data off heap with an efficient representation that matches your workload.

When RAM was cheap, but not cheap enough to store large amounts of data the memory efficiency of the JVM was acceptable because the heap was primarily configuration and scratch space. Now that we store hundreds of gigabytes it pays to explore other options. For instance a red black tree storing 4-byte keys and 4-byte values can store 500 million keys with an off heap red-black tree with 10-gig RSS, but java.util.Map can only store 164 million keys with a 12 gigabyte Java heap.

Even worse, the GC pause times go through the roof at 80 million keys with java.util.Map and the parallel STW collector. Not everything is an object graph connected by pointers and the JVM/Java need to accommodate that in some reasonable way. I don't know if Azul does anything to address space efficiency the way they have addressed pause times, but to me that is the next step to improving the JVM and GC.

March 8, 2013 | Unregistered CommenterAriel Weisberg

" it's JVM technology that limits the scalability of enterprise Java applications. " Nope, its OOP that limits the scalability. I am not kidding. Heap fragmentation is known issue even for C++ applications and if you program speaks "objects" than you will eventually hit this wall. Amen.
Conclusion: C and only C will save enterprise world from inevitable demise.

March 8, 2013 | Unregistered CommenterVladimir Rodionov

"It seems to me we can start to think of a sustainable model of computing using cooperating mobile devices rather than expensive backend servers."
Ever heard about Skype? Or Freenet? Or Syndie? Come on, you talk about stuff that has been reality for the last 10+ years, if you're starting to think about it now - it's just you.

March 9, 2013 | Unregistered Commenter-

I feel pity for that Dog in the picture

March 9, 2013 | Unregistered CommenterRajib

No doubt peer-to-peer is not new, but that's not what I'm referencing. They key word is sustainable.

Skype uses ‘mega-supernodes’ in the cloud to improve reliability, so it's no a pure p2p play. Moreover, it's not a general platform. Programmers can not deploy programs over Skype. Skype is just for Skype. My understanding is syndie and freenet both move data, but we're talking about computation, running programs.

With JavaScript, WebRTC, and web workers, there is a real platform feel to the whole stack that shouldn't require a server to run.

As an analogy the server/storage/bandwidth are the fertilizer/seeds/pesticides/equipment of the modern factory farmer. Huge dollars are spend on input to grow food. Even in a highly regulated industry farmers find it hard to make money because they spend so much money on supplies. That's because our food sources are based on annuals. They need to be replanted each year so they need all their nutrients supplied to them each year, they are starting over again each year.

In contrast a perennial based food supply doesn't require new inputs every year. This saves on labor and increases profits.

Sustainability is the idea that to provide your food supply you shouldn't have to bring in inputs. Input costs make it very difficult to make money. If you can use the sun, capture water in your land, and keep fertility on your land, you are keep the cycle going. You are sustainable.

In computer systems we have nothing like the free resources of the sun and air to host our computation. So we need to bring in inputs. These inputs are expensive and make if very difficult to make a living without a very high margin product.

A computation fabric stitched together from ambient resources found on mobile platforms and linked together via ubiquitous web technologies begins to supply a sustainable application substrate. It's sustainable in the sense that is subsidized for other uses, your mobile needs, and has a huge excess of capabilities. An application could run on a 1000 nodes for very little money with very little impact on the devices it's running on. Thus an application could provide a lot of service at a low cost which would help drive adoption. The same sort of application in datacenter would make in unaffordable, which means new inputs would always be required. Startups often just burn resources and fail when they have to buy inputs. With a different deployment model that equation can change.

So that's my sense of sustainable and I'm not aware of anything like that currently.

March 9, 2013 | Registered CommenterTodd Hoff

"Skype uses ‘mega-supernodes’ in the cloud to improve reliability, so it's no a pure p2p play."
They added mega-supernodes when pure P2P failed for them. If they used WebRTC they would fail the same and apply the same fixes.
"Moreover, it's not a general platform. Programmers can not deploy programs over Skype. Skype is just for Skype. My understanding is syndie and freenet both move data, but we're talking about computation, running programs."
Yes, neither is a platform, they are apps. All these apps use P2P to distribute data and have client software do some things with the data.
They show that you could successfully write serverless apps (though Skype has never been fully serverless) for many years already. They are sustainable in the way you described, with the exception for Skype that is pretty close anyway (though I included it only because everybody knows it ;) ). People have been not just thinking about it, they have been doing it successfully. WebRTC does not enable anything new. It just makes things easier....and maybe adds some visibility to P2P computing.

March 10, 2013 | Unregistered Commenter-

Easier is something new. It means ubiquity, which is a completely different opportunity space.

March 10, 2013 | Registered CommenterTodd Hoff

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>