advertise
Wednesday
Jan072015

The Ultimate Guide: 5 Methods for Debugging Production Servers at Scale

This a guest post by Alex Zhitnitsky, an engineer working at Takipi, who is on a mission to help Java and Scala developers solve bugs in production and rid the world of buggy software.

How to approach the production debugging conundrum?

All sorts of wild things happen when your code leaves the safe and warm development environment. Unlike the comfort of the debugger in your favorite IDE, when errors happen on a live server - you better come prepared. No more breakpoints, step over, or step into, and you can forget about adding that quick line of code to help you understand what just happened. In production, bad things happen first and then you have to figure out what exactly went wrong. To be able to debug in this kind of environment we first need to switch our debugging mindset to plan ahead. If you’re not prepared with good practices in advance, roaming around aimlessly through the logs wouldn’t be too effective.

And that’s not all. With high scalability architectures, enter high scalability errors. In many cases we find transactions that originate on one machine or microservice and break something on another. Together with Continuous Delivery practices and constant code changes, errors find their way to production with an increasing rate. The biggest problem we’re facing here is capturing the exact state which led to the error, what were the variable values, which thread are we in, and what was this piece of code even trying to do?

Let’s take a look at 5 methods that can help us answer just that. Distributed logging, advanced jstack techniques, BTrace and other custom JVM agents:

1. Distributed Logging

Click to read more ...

Tuesday
Jan062015

Sponsored Post: Wikia, MemSQL, Campanja, Hypertable, Sprout Social, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • DevOps Engineer for Wikia. Wikia is the go-to place for fan content that is created entirely by fans! As a Quantcast Top 20 site with over 120 million monthly uniques we are tackling very interesting problems at a scale you won't find at many other places. We embrace a DevOps culture and are looking to expand our team with people that are excited about working with just about every piece of our stack. You'll also partner with our platform team as they break down the monolith and move towards service oriented architecture. Please apply here.

  • Engineer Manager - Platform. At Wikia we're tackling interesting problems at a scale you won't find at many other places. We're a Quantcast Top 20 site with over 120 million monthly uniques. 100% of the content on our 400,000+ communities is user generated. That combination of scale and UGC creates some pretty compelling challenges and on top of that we're working on moving away from a monolithic architecture and actively working on finding the best technologies to best suit each individual piece of our platform. We're currently in search of an experienced Engineer Manager to help drive this process. Please apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • Performance and Scale EngineerSprout Social, will be like a physical trainer for the Sprout social media management platform: you will evaluate and make improvements to keep our large, diverse tech stack happy, healthy, and, most importantly, fast. You'll work up and down our back-end stack - from our RESTful API through to our myriad data systems and into the Java services and Hadoop clusters that feed them - searching for SPOFs, performance issues, and places where we can shore things up. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/

Cool Products and Services

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Aerospike Hits 1M writes per second with 6x Fewer Servers than Cassandra. A new Google Compute Engine benchmark demonstrates how the Aerospike database hit 1 million writes per second with just 50 nodes - compared to Cassandra's 300 nodes. Read the benchmark: http://www.aerospike.com/blog/1m-wps-6x-fewer-servers-than-cassandra/

  • Hypertable Inc. Announces New UpTime Support Subscription Packages. The developer of Hypertable, an open-source, high-performance, massively scalable database, announces three new UpTime support subscription packages – Premium 24/7, Enterprise 24/7 and Basic. 24/7/365 support packages start at just $1995 per month for a ten node cluster -- $49.95 per machine, per month thereafter. For more information visit us on the Web at http://www.hypertable.com/. Connect with Hypertable: @hypertable--Blog.

  • FoundationDB 3.0. 3.0 makes the power of a multi-model, ACID transactional database available to a set of new connected device apps that are generating data at previously unheard of speed. It is the fastest, most scalable, transactional database in the cloud - A 32 machine cluster running on Amazon EC2 sustained more than 14M random operations per second.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free! (See how Scalyr is different if you're looking for a Splunk alternative.)

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Jan052015

Von Neumann had one piece of advice for us: not to originate anything.

I don't know about you, but when I read about the exploits of people like John von Neumann, Alan Turing, J. Robert Oppenheimer, and Kurt Gödel in Turing's Cathedral: The Origins of the Digital Universe by George Dyson, I can't help but flash back to the Age of Heroes, where the names are different--Achilles, Odysseus, Agamemnon, and Ajax--but the larger than life story they lived is familiar. Dyson's book is the Iliad of our times, telling the story of great battles of the human mind: the atomic bomb, Turing machines, programmable computers, weather prediction, genetic-modeling, Monte Carlo simulation, and cellular automata.

Which brings up another question I can't help but ponder: is it the age that makes the person or is it the person that makes the age? Do we have these kind of people today? Or can they only be forged in war?

Anyway, I found this advice from John von Neumann, as told by Julian Bigelow, about how to go about building the MANIAC  computer. This advice still echoes down project management halls today:

“Von Neumann had one piece of advice for us: not to originate anything.” This helped put the IAS project in the lead. “One of the reasons our group was successful, and got a big jump on others, was that we set up certain limited objectives, namely that we would not produce any new elementary components,” adds Bigelow. “We would try and use the ones which were available for standard communications purposes. We chose vacuum tubes which were in mass production, and very common types, so that we could hope to get reliable components, and not have to go into component research.”

They did innovate on architecture by making it possible to store and run programs. Some interesting quotes from the book around that development:

Click to read more ...

Friday
Jan022015

Stuff The Internet Says On Scalability For January 2nd, 2015

Hey, it's HighScalability time:

 

  • 53 kilobytes: total amount of RAM in the world in 1953; 180-200 million: daily transactions at The Weather Channel; 
  • Quotable Quotes
    • Enquist, Brian: Life operates over 21 orders of magnitude in size - From Unicells to Whales and Giant Sequoias 
    • George Dyson: Digital computers translate between these two forms of information—structure and sequence—according to definite rules. Bits that are embodied as structure (varying in space, invariant across time) we perceive as memory, and bits that are embodied as sequence (varying in time, invariant across space) we perceive as code. Gates are the intersections where bits span both worlds at the moments of transition
    • : what is “scaling”? In its most elemental form, it simply refers to how systems respond when their sizes change
    • @muratdemirbas: Eventual consistency should not come to mean "Only God can judge me".
    • Raffi Krikorian: Every Problem is a Scaling Problem
    • The High-Interest Credit Card of Technical Debt: Experience has shown that the external world is rarely stable.
    • @Apcera: "#HybridCloud ROI isn’t there, & the complexity is huge." via @stevesi @Recode http://ow.ly/Gspxq  Time for a new solution in 2015. #PaaS
    • Nathan Bronson: I believe that to tackle big problems one must factor complexity into pieces that can each fit in someone’s brain, and that the key to such factoring is to create abstractions that hide complexity behind a simple mental model.

  • A prediction for the new year: algorithm profilers will be a hot new job category. Optical Illusions That Fool Google-Style Image Recognition Algorithms. SEO and HFT are a kind of profiling, but with the spread of algorithms through the consumption of the world by software, the hacking of all sorts of algorithms for advantage will become a permanent fixture of modern life. One more layer to the game.

  • Interesting idea from Brett Slatkin. Our approach to manufacturing is as quaint as punchcards: You'd turn in your punch cards and hope to get the output a week later — sooner if you were lucky...3D printing is slow. Even though laser printing can produce precision parts like rocket engines, it doesn't scale...To build cars, cell phones, and soda cans you need to produce high volumes quickly...What we need is a way to click a button and launch a manufacturing process.

  • If you need to optimize your Rails App for concurrency here's a good source: Heroku and Puma vs. Heroku and Unicorn. Puma was the winner, improving quality of service and reducing hosting costs. With Puman many fewer dynos were needed. The comment section has a vigorous debate.

  • The Current State of the Blockchain: Bitcoin, in its current state, cannot act as a major transaction network. Because blocks are current limited to be 1 MB in size, Bitcoin is limited to handle roughly 7 transactions per second. In comparison, thousands of credit card transactions happen per second across the world. < Good discussion on reddit. Also, The Blockchain is the New Database, Get Ready to Rewrite Everything

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Dec312014

Linus: The whole "parallel computing is the future" is a bunch of crock.

Linus Torvalds in his usual politically correct way made a typically understated statement about “pushing the whole parallelism snake-oil” that generated almost no response whatsoever.

Well, not quite. His comment on Avoiding ping pong has generated hundreds of responses, both on the original post and on Reddit.

The contention:

The whole "let's parallelize" thing is a huge waste of everybody's time. There's this huge body of "knowledge" that parallel is somehow more efficient, and that whole huge body is pure and utter garbage. Big caches are efficient. Parallel stupid small cores without caches are horrible unless you have a very specific load that is hugely regular (ie graphics).

Nobody is ever going to go backwards from where we are today. Those complex OoO [Out-of-order execution] cores aren't going away. Scaling isn't going to continue forever, and people want mobility, so the crazies talking about scaling to hundreds of cores are just that - crazy. Why give them an ounce of credibility?

Where the hell do you envision that those magical parallel algorithms would be used?

The only place where parallelism matters is in graphics or on the server side, where we already largely have it. Pushing it anywhere else is just pointless.

So give up on parallelism already. It's not going to happen. End users are fine with roughly on the order of four cores, and you can't fit any more anyway without using too much energy to be practical in that space. And nobody sane would make the cores smaller and weaker in order to fit more of them - the only reason to make them smaller and weaker is because you want to go even further down in power use, so you'd still not have lots of those weak cores.

Give it up. The whole "parallel computing is the future" is a bunch of crock.

An interesting question to ponder on the cusp of a new year. What will programs look like in the future? Very different than they look today? Or pretty much the same?

From the variety of replies to Linus it's obvious we are in no danger of arriving at consensus. There was the usual discussion of the differences between distributed, parallel, concurrent, and multithreading, with each succeeding explanation more confusing than the next. The general gist being that how you describe a problem in code is not how it has to run.  Which is why I was not surprised to see a mini-language war erupt. 

The idea is parallelization is a problem only because of the old fashioned languages that are used. Use a better language and parallelization of the design can be separated from the runtime and it will all just magically work. There are echoes here of how datacenter architectures are now utilizing schedulers like Mesos to treat entire datacenters as a programmable fabric. 

One of the more interesting issues raised in the comments was a confusion over what exactly is a server? Can a desktop machine that needs to run fast parallel builds be considered a server? An unsatisfying definition of a not-server may simply be a device that can comfortably run applications that aren't highly parallelized. 

I pulled out some of the more representative comments from the threads for your enjoyment. The consensus? There is none, but it's quite an interesting discussion...

Click to read more ...

Tuesday
Dec232014

Sponsored Post: MemSQL, Campanja, Hypertable, Sprout Social, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • DevOps Engineer for Wikia. Wikia is the go-to place for fan content that is created entirely by fans! As a Quantcast Top 20 site with over 120 million monthly uniques we are tackling very interesting problems at a scale you won't find at many other places. We embrace a DevOps culture and are looking to expand our team with people that are excited about working with just about every piece of our stack. You'll also partner with our platform team as they break down the monolith and move towards service oriented architecture. Please apply here.

  • Engineer Manager - Platform. At Wikia we're tackling interesting problems at a scale you won't find at many other places. We're a Quantcast Top 20 site with over 120 million monthly uniques. 100% of the content on our 400,000+ communities is user generated. That combination of scale and UGC creates some pretty compelling challenges and on top of that we're working on moving away from a monolithic architecture and actively working on finding the best technologies to best suit each individual piece of our platform. We're currently in search of an experienced Engineer Manager to help drive this process. Please apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • Performance and Scale EngineerSprout Social, will be like a physical trainer for the Sprout social media management platform: you will evaluate and make improvements to keep our large, diverse tech stack happy, healthy, and, most importantly, fast. You'll work up and down our back-end stack - from our RESTful API through to our myriad data systems and into the Java services and Hadoop clusters that feed them - searching for SPOFs, performance issues, and places where we can shore things up. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/

Cool Products and Services

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Aerospike Hits 1M writes per second with 6x Fewer Servers than Cassandra. A new Google Compute Engine benchmark demonstrates how the Aerospike database hit 1 million writes per second with just 50 nodes - compared to Cassandra's 300 nodes. Read the benchmark: http://www.aerospike.com/blog/1m-wps-6x-fewer-servers-than-cassandra/

  • Hypertable Inc. Announces New UpTime Support Subscription Packages. The developer of Hypertable, an open-source, high-performance, massively scalable database, announces three new UpTime support subscription packages – Premium 24/7, Enterprise 24/7 and Basic. 24/7/365 support packages start at just $1995 per month for a ten node cluster -- $49.95 per machine, per month thereafter. For more information visit us on the Web at http://www.hypertable.com/. Connect with Hypertable: @hypertable--Blog.

  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free!

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Dec222014

Scalability as a Service

This is a guest post by Thierry Schellenbach, CEO GetStream.io and author of the open source Stream-Framework, which enables you to build scalable newsfeeds using Cassandra or Redis.

We first wrote about our newsfeed architecture on High Scalability in October 2013. Since then our open source Stream-Framework grew to be the most used package for building scalable newsfeeds. We’re very grateful to the High Scalability community for all the support.

In this article I want to highlight the current trend in our industry of moving  towards externally hosted components. We’re going to compare the hosted solutions for search, newsfeeds and realtime functionality to their open source alternative. This move towards hosted components means you can add scalable components to your app at a fraction of the effort it took just a few years ago.

1.) Search servers

Click to read more ...

Friday
Dec192014

Stuff The Internet Says On Scalability For December 19th, 2014

Hey, it's HighScalability time:


Brilliant & hilarious keynote to finish the day at #yow14 (Matt)

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Dec172014

The Big Problem is Medium Data

This is a guest post by Matt Hunt, who leads open source projects for Bloomberg LP R&D. 

“Big Data” systems continue to attract substantial funding, attention, and excitement. As with many new technologies, they are neither a panacea, nor even a good fit for many common uses. Yet they also hold great promise. The question is, can systems originally designed to serve hundreds of millions of requests for something like web pages also work for requests that are computationally expensive and have tight tolerances?

Modern era big data technologies are a solution to an economics problem faced by Google and other Internet giants a decade ago. Storing, indexing, and responding to searches against all web pages required tremendous amounts of disk space and computer power. Very powerful machines, fast SAN storage, and data center space were prohibitively expensive. The solution was to pack cheap commodity machines as tightly together as possible with local disks.

This addressed the space and hardware cost problem, but introduced a software challenge. Writing distributed code is hard, and with many machines comes many failures. So a framework was also required to take care of such problems automatically for the system to be viable.

Hadoop

Right now, we’re in a transition phase in the industry in computing built from the entrance of Hadoop and its community starting in 2004. Understanding why and how these systems were created also offers insight into some of their weaknesses.  

At Bloomberg that we don’t have a big data problem. What we have is a “medium data” problem -- and so does everyone else.   Systems such as Hadoop and Spark are less efficient and mature for these typical low latency enterprise uses in general. High core counts, SSDs, and large RAM footprints are common today - but many of the commodity platforms have yet to take full advantage of them, and challenges remain.  A number of distributed components are further hampered by Java, which creates its own complications for low latency performance.

A practical use case

Click to read more ...

Tuesday
Dec162014

Multithreaded Programming has Really Gone to the Dogs

Taken from Multithreaded programming - theory and practice on reddit, which also has some very funny comments. If anything this is way too organized. 

 What's not shown? All the little messes that have to be cleaned up after...