advertise
Tuesday
Jan062015

Sponsored Post: Wikia, MemSQL, Campanja, Hypertable, Sprout Social, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • DevOps Engineer for Wikia. Wikia is the go-to place for fan content that is created entirely by fans! As a Quantcast Top 20 site with over 120 million monthly uniques we are tackling very interesting problems at a scale you won't find at many other places. We embrace a DevOps culture and are looking to expand our team with people that are excited about working with just about every piece of our stack. You'll also partner with our platform team as they break down the monolith and move towards service oriented architecture. Please apply here.

  • Engineer Manager - Platform. At Wikia we're tackling interesting problems at a scale you won't find at many other places. We're a Quantcast Top 20 site with over 120 million monthly uniques. 100% of the content on our 400,000+ communities is user generated. That combination of scale and UGC creates some pretty compelling challenges and on top of that we're working on moving away from a monolithic architecture and actively working on finding the best technologies to best suit each individual piece of our platform. We're currently in search of an experienced Engineer Manager to help drive this process. Please apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • Performance and Scale EngineerSprout Social, will be like a physical trainer for the Sprout social media management platform: you will evaluate and make improvements to keep our large, diverse tech stack happy, healthy, and, most importantly, fast. You'll work up and down our back-end stack - from our RESTful API through to our myriad data systems and into the Java services and Hadoop clusters that feed them - searching for SPOFs, performance issues, and places where we can shore things up. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/

Cool Products and Services

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Aerospike Hits 1M writes per second with 6x Fewer Servers than Cassandra. A new Google Compute Engine benchmark demonstrates how the Aerospike database hit 1 million writes per second with just 50 nodes - compared to Cassandra's 300 nodes. Read the benchmark: http://www.aerospike.com/blog/1m-wps-6x-fewer-servers-than-cassandra/

  • Hypertable Inc. Announces New UpTime Support Subscription Packages. The developer of Hypertable, an open-source, high-performance, massively scalable database, announces three new UpTime support subscription packages – Premium 24/7, Enterprise 24/7 and Basic. 24/7/365 support packages start at just $1995 per month for a ten node cluster -- $49.95 per machine, per month thereafter. For more information visit us on the Web at http://www.hypertable.com/. Connect with Hypertable: @hypertable--Blog.

  • FoundationDB 3.0. 3.0 makes the power of a multi-model, ACID transactional database available to a set of new connected device apps that are generating data at previously unheard of speed. It is the fastest, most scalable, transactional database in the cloud - A 32 machine cluster running on Amazon EC2 sustained more than 14M random operations per second.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free! (See how Scalyr is different if you're looking for a Splunk alternative.)

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Jan052015

Von Neumann had one piece of advice for us: not to originate anything.

I don't know about you, but when I read about the exploits of people like John von Neumann, Alan Turing, J. Robert Oppenheimer, and Kurt Gödel in Turing's Cathedral: The Origins of the Digital Universe by George Dyson, I can't help but flash back to the Age of Heroes, where the names are different--Achilles, Odysseus, Agamemnon, and Ajax--but the larger than life story they lived is familiar. Dyson's book is the Iliad of our times, telling the story of great battles of the human mind: the atomic bomb, Turing machines, programmable computers, weather prediction, genetic-modeling, Monte Carlo simulation, and cellular automata.

Which brings up another question I can't help but ponder: is it the age that makes the person or is it the person that makes the age? Do we have these kind of people today? Or can they only be forged in war?

Anyway, I found this advice from John von Neumann, as told by Julian Bigelow, about how to go about building the MANIAC  computer. This advice still echoes down project management halls today:

“Von Neumann had one piece of advice for us: not to originate anything.” This helped put the IAS project in the lead. “One of the reasons our group was successful, and got a big jump on others, was that we set up certain limited objectives, namely that we would not produce any new elementary components,” adds Bigelow. “We would try and use the ones which were available for standard communications purposes. We chose vacuum tubes which were in mass production, and very common types, so that we could hope to get reliable components, and not have to go into component research.”

They did innovate on architecture by making it possible to store and run programs. Some interesting quotes from the book around that development:

Click to read more ...

Friday
Jan022015

Stuff The Internet Says On Scalability For January 2nd, 2015

Hey, it's HighScalability time:

 

  • 53 kilobytes: total amount of RAM in the world in 1953; 180-200 million: daily transactions at The Weather Channel; 
  • Quotable Quotes
    • Enquist, Brian: Life operates over 21 orders of magnitude in size - From Unicells to Whales and Giant Sequoias 
    • George Dyson: Digital computers translate between these two forms of information—structure and sequence—according to definite rules. Bits that are embodied as structure (varying in space, invariant across time) we perceive as memory, and bits that are embodied as sequence (varying in time, invariant across space) we perceive as code. Gates are the intersections where bits span both worlds at the moments of transition
    • : what is “scaling”? In its most elemental form, it simply refers to how systems respond when their sizes change
    • @muratdemirbas: Eventual consistency should not come to mean "Only God can judge me".
    • Raffi Krikorian: Every Problem is a Scaling Problem
    • The High-Interest Credit Card of Technical Debt: Experience has shown that the external world is rarely stable.
    • @Apcera: "#HybridCloud ROI isn’t there, & the complexity is huge." via @stevesi @Recode http://ow.ly/Gspxq  Time for a new solution in 2015. #PaaS
    • Nathan Bronson: I believe that to tackle big problems one must factor complexity into pieces that can each fit in someone’s brain, and that the key to such factoring is to create abstractions that hide complexity behind a simple mental model.

  • A prediction for the new year: algorithm profilers will be a hot new job category. Optical Illusions That Fool Google-Style Image Recognition Algorithms. SEO and HFT are a kind of profiling, but with the spread of algorithms through the consumption of the world by software, the hacking of all sorts of algorithms for advantage will become a permanent fixture of modern life. One more layer to the game.

  • Interesting idea from Brett Slatkin. Our approach to manufacturing is as quaint as punchcards: You'd turn in your punch cards and hope to get the output a week later — sooner if you were lucky...3D printing is slow. Even though laser printing can produce precision parts like rocket engines, it doesn't scale...To build cars, cell phones, and soda cans you need to produce high volumes quickly...What we need is a way to click a button and launch a manufacturing process.

  • If you need to optimize your Rails App for concurrency here's a good source: Heroku and Puma vs. Heroku and Unicorn. Puma was the winner, improving quality of service and reducing hosting costs. With Puman many fewer dynos were needed. The comment section has a vigorous debate.

  • The Current State of the Blockchain: Bitcoin, in its current state, cannot act as a major transaction network. Because blocks are current limited to be 1 MB in size, Bitcoin is limited to handle roughly 7 transactions per second. In comparison, thousands of credit card transactions happen per second across the world. < Good discussion on reddit. Also, The Blockchain is the New Database, Get Ready to Rewrite Everything

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Dec312014

Linus: The whole "parallel computing is the future" is a bunch of crock.

Linus Torvalds in his usual politically correct way made a typically understated statement about “pushing the whole parallelism snake-oil” that generated almost no response whatsoever.

Well, not quite. His comment on Avoiding ping pong has generated hundreds of responses, both on the original post and on Reddit.

The contention:

The whole "let's parallelize" thing is a huge waste of everybody's time. There's this huge body of "knowledge" that parallel is somehow more efficient, and that whole huge body is pure and utter garbage. Big caches are efficient. Parallel stupid small cores without caches are horrible unless you have a very specific load that is hugely regular (ie graphics).

Nobody is ever going to go backwards from where we are today. Those complex OoO [Out-of-order execution] cores aren't going away. Scaling isn't going to continue forever, and people want mobility, so the crazies talking about scaling to hundreds of cores are just that - crazy. Why give them an ounce of credibility?

Where the hell do you envision that those magical parallel algorithms would be used?

The only place where parallelism matters is in graphics or on the server side, where we already largely have it. Pushing it anywhere else is just pointless.

So give up on parallelism already. It's not going to happen. End users are fine with roughly on the order of four cores, and you can't fit any more anyway without using too much energy to be practical in that space. And nobody sane would make the cores smaller and weaker in order to fit more of them - the only reason to make them smaller and weaker is because you want to go even further down in power use, so you'd still not have lots of those weak cores.

Give it up. The whole "parallel computing is the future" is a bunch of crock.

An interesting question to ponder on the cusp of a new year. What will programs look like in the future? Very different than they look today? Or pretty much the same?

From the variety of replies to Linus it's obvious we are in no danger of arriving at consensus. There was the usual discussion of the differences between distributed, parallel, concurrent, and multithreading, with each succeeding explanation more confusing than the next. The general gist being that how you describe a problem in code is not how it has to run.  Which is why I was not surprised to see a mini-language war erupt. 

The idea is parallelization is a problem only because of the old fashioned languages that are used. Use a better language and parallelization of the design can be separated from the runtime and it will all just magically work. There are echoes here of how datacenter architectures are now utilizing schedulers like Mesos to treat entire datacenters as a programmable fabric. 

One of the more interesting issues raised in the comments was a confusion over what exactly is a server? Can a desktop machine that needs to run fast parallel builds be considered a server? An unsatisfying definition of a not-server may simply be a device that can comfortably run applications that aren't highly parallelized. 

I pulled out some of the more representative comments from the threads for your enjoyment. The consensus? There is none, but it's quite an interesting discussion...

Click to read more ...

Tuesday
Dec232014

Sponsored Post: MemSQL, Campanja, Hypertable, Sprout Social, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • DevOps Engineer for Wikia. Wikia is the go-to place for fan content that is created entirely by fans! As a Quantcast Top 20 site with over 120 million monthly uniques we are tackling very interesting problems at a scale you won't find at many other places. We embrace a DevOps culture and are looking to expand our team with people that are excited about working with just about every piece of our stack. You'll also partner with our platform team as they break down the monolith and move towards service oriented architecture. Please apply here.

  • Engineer Manager - Platform. At Wikia we're tackling interesting problems at a scale you won't find at many other places. We're a Quantcast Top 20 site with over 120 million monthly uniques. 100% of the content on our 400,000+ communities is user generated. That combination of scale and UGC creates some pretty compelling challenges and on top of that we're working on moving away from a monolithic architecture and actively working on finding the best technologies to best suit each individual piece of our platform. We're currently in search of an experienced Engineer Manager to help drive this process. Please apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • Performance and Scale EngineerSprout Social, will be like a physical trainer for the Sprout social media management platform: you will evaluate and make improvements to keep our large, diverse tech stack happy, healthy, and, most importantly, fast. You'll work up and down our back-end stack - from our RESTful API through to our myriad data systems and into the Java services and Hadoop clusters that feed them - searching for SPOFs, performance issues, and places where we can shore things up. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/

Cool Products and Services

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Aerospike Hits 1M writes per second with 6x Fewer Servers than Cassandra. A new Google Compute Engine benchmark demonstrates how the Aerospike database hit 1 million writes per second with just 50 nodes - compared to Cassandra's 300 nodes. Read the benchmark: http://www.aerospike.com/blog/1m-wps-6x-fewer-servers-than-cassandra/

  • Hypertable Inc. Announces New UpTime Support Subscription Packages. The developer of Hypertable, an open-source, high-performance, massively scalable database, announces three new UpTime support subscription packages – Premium 24/7, Enterprise 24/7 and Basic. 24/7/365 support packages start at just $1995 per month for a ten node cluster -- $49.95 per machine, per month thereafter. For more information visit us on the Web at http://www.hypertable.com/. Connect with Hypertable: @hypertable--Blog.

  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free!

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Dec222014

Scalability as a Service

This is a guest post by Thierry Schellenbach, CEO GetStream.io and author of the open source Stream-Framework, which enables you to build scalable newsfeeds using Cassandra or Redis.

We first wrote about our newsfeed architecture on High Scalability in October 2013. Since then our open source Stream-Framework grew to be the most used package for building scalable newsfeeds. We’re very grateful to the High Scalability community for all the support.

In this article I want to highlight the current trend in our industry of moving  towards externally hosted components. We’re going to compare the hosted solutions for search, newsfeeds and realtime functionality to their open source alternative. This move towards hosted components means you can add scalable components to your app at a fraction of the effort it took just a few years ago.

1.) Search servers

Click to read more ...

Friday
Dec192014

Stuff The Internet Says On Scalability For December 19th, 2014

Hey, it's HighScalability time:


Brilliant & hilarious keynote to finish the day at #yow14 (Matt)

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Dec172014

The Big Problem is Medium Data

This is a guest post by Matt Hunt, who leads open source projects for Bloomberg LP R&D. 

“Big Data” systems continue to attract substantial funding, attention, and excitement. As with many new technologies, they are neither a panacea, nor even a good fit for many common uses. Yet they also hold great promise. The question is, can systems originally designed to serve hundreds of millions of requests for something like web pages also work for requests that are computationally expensive and have tight tolerances?

Modern era big data technologies are a solution to an economics problem faced by Google and other Internet giants a decade ago. Storing, indexing, and responding to searches against all web pages required tremendous amounts of disk space and computer power. Very powerful machines, fast SAN storage, and data center space were prohibitively expensive. The solution was to pack cheap commodity machines as tightly together as possible with local disks.

This addressed the space and hardware cost problem, but introduced a software challenge. Writing distributed code is hard, and with many machines comes many failures. So a framework was also required to take care of such problems automatically for the system to be viable.

Hadoop

Right now, we’re in a transition phase in the industry in computing built from the entrance of Hadoop and its community starting in 2004. Understanding why and how these systems were created also offers insight into some of their weaknesses.  

At Bloomberg that we don’t have a big data problem. What we have is a “medium data” problem -- and so does everyone else.   Systems such as Hadoop and Spark are less efficient and mature for these typical low latency enterprise uses in general. High core counts, SSDs, and large RAM footprints are common today - but many of the commodity platforms have yet to take full advantage of them, and challenges remain.  A number of distributed components are further hampered by Java, which creates its own complications for low latency performance.

A practical use case

Click to read more ...

Tuesday
Dec162014

Multithreaded Programming has Really Gone to the Dogs

Taken from Multithreaded programming - theory and practice on reddit, which also has some very funny comments. If anything this is way too organized. 

 What's not shown? All the little messes that have to be cleaned up after...

Tuesday
Dec162014

The Machine: HP's New Memristor Based Datacenter Scale Computer - Still Changing Everything

The end of Moore’s law is the best thing that’s happened to computing in the last 50 years. Moore’s law has been a tyranny of comfort. You were assured your chips would see a constant improvement. Everyone knew what was coming and when it was coming. The entire semiconductor industry was held captive to delivering on Moore’s law. There was no new invention allowed in the entire process. Just plod along on the treadmill and do what was expected. We are finally breaking free of these shackles and entering what is the most exciting age of computing that we’ve seen since the late 1940s. Finally we are in a stage where people can invent and those new things will be tried out and worked on and find their way into the market. We’re finally going to do things differently and smarter.

-- Stanley Williams (paraphrased)

HP has been working on a radically new type of computer, enigmatically called The Machine (not this machine). The Machine is perhaps the largest R&D project in the history of HP. It’s a complete rebuild of both hardware and software from the ground up. A massive effort. HP hopes to have a small version of their datacenter scale product up and running in two years.

The story began when we first met HP’s Stanley Williams about four years ago in How Will Memristors Change Everything? In the latest chapter of the memristor story, Mr. Williams gives another incredible talk: The Machine: The HP Memristor Solution for Computing Big Data, revealing more about how The Machine works.

The goal of The Machine is to collapse the memory/storage hierarchy. Computation today is energy inefficient. Eighty percent of the energy and vast amounts of time are spent moving bits between hard disks, memory, processors, and multiple layers of cache. Customers end up spending more money on power bills than on the machines themselves. So the machine has no hard disks, DRAM, or flash. Data is held in power efficient memristors, an ion based nonvolatile memory, and data is moved over a photonic network, another very power efficient technology. When a bit of information leaves a core it leaves as a pulse of light.

On graph processing benchmarks The Machine reportedly performs 2-3 orders of magnitude better based on energy efficiency and one order of magnitude better based on time. There are no details on these benchmarks, but that’s the gist of it.

The Machine puts data first. The concept is to build a system around nonvolatile memory with processors sprinkled liberally throughout the memory. When you want to run a program you send the program to a processor near the memory, do the computation locally, and send the results back. Computation uses a wide range of heterogeneous multicore processors. By only transmitting the bits required for the program and the results the savings is enormous when compared to moving terabytes or petabytes of data around.

The Machine is not targeted at standard HPC workloads. It’s not a LINPACK buster. The problem HP is trying to solve for their customers is where a customer wants to perform a query and figure out the answer by searching through a gigantic pile of data. Problems that need to store lots of data and analyze in realtime as new data comes in

Why is a very different architecture needed for building a computer? Computer systems can’t not keep up with the flood of data that’s coming in. HP is hearing from their customers that they need the ability to handle ever greater amounts of data. The amount of bits that are being collected is growing exponentially faster than the rate at which transistors are being manufactured. It’s also the case that information collection is growing faster than the rate at which hard disks are being manufactured. HP estimates there are 250 trillion DVDs worth of data that people really want to do something with. Vast amount of data are being collected in the world are never even being looked at.

So something new is needed. That’s at least the bet HP is making. While it’s easy to get excited about the technology HP is developing, it won’t be for you and me, at least until the end of the decade. These will not be commercial products for quite a while. HP intends to use them for their own enterprise products, internally consuming everything that’s made. The idea is we are still very early in the tech cycle, so high cost systems are built first, then as volumes grow and processes improve, the technology will be ready for commercial deployment. Eventually costs will come down enough that smaller form factors can be sold.

What is interesting is HP is essentially building its own cloud infrastructure, but instead of leveraging commodity hardware and software, they are building their own best of breed custom hardware and software. A cloud typically makes available vast pools of memory, disk, and CPU, organized around instance types which are connected by fast networks. Recently there’s a move to treat these resource pools as independent of the underlying instances. So we are seeing high level scheduling software like Kubernetes and Mesos becoming bigger forces in the industry. HP has to build all this software themselves, solving many of the same problems, along with the opportunities provided by specialized chips. You can imagine programmers programming very specialized applications to eke out every ounce of performance from The Machine, but what is more likely is HP will have to create a very sophisticated scheduling system to optimize how programs run on top of The Machine. What's next in software is the evolution of a kind of Holographic Application Architecture, where function is fluid in both time and space, and identity arises at run-time from a two-dimensional structure. Schedule optimization is the next frontier being explored on the cloud.

The talk is organized in two broad sections: hardware and software. Two-thirds of the project is software, but Mr. Williams is a hardware guy, so hardware makes up the majority of the talk.  The hardware section is based around the idea of optimizing the various functions around the physics that is available: electrons compute; ions store; photons communicate.

Here’s is my gloss on Mr. Williams talk. As usual with such a complex subject much can be missed. Also, Mr. Williams tosses huge interesting ideas around like pancakes, so viewing the talk is highly recommended. But until then, let’s see The Machine HP thinks will be the future of computing….

Click to read more ...