advertise
Friday
Jan302015

Stuff The Internet Says On Scalability For January 30th, 2015

Hey, it's HighScalability time:


It's a strange world...exotic, gigantic molecules Fit Inside Each Other like Russian nesting dolls
  • 1.39 billion: Facebook Monthly Active Users; $18 billion profit: Apple in 3 months; 200 million: Kik users; 11.2 billion: age of the oldest known solar system; 3 billion: videos viewed per day on Facebook
  • Quotable Quotes:
    • @kevinroose: This dude wins SF bingo. RT @caro: An Uber driver is Airbnb'ing the trunk of his Tesla for $85/night.
    • @BenedictEvans: Only 16% of Facebook DAUs aren't using it on mobile
    • @rezendi: Yo's Law: "in the 21st century tech industry, satire and reality are not merely indistinguishable but actually interchangeable."
    • Brent Ozar: I recommend that people back up data, not servers.
    • @AnnaPawlicka: "Shared State is the Root of All Evil"
    • Peter Lawrey: micro-day - about 1/12 of a second. micro-century - 51.3 minutes. femto-parsec - about 30 metres.
    • TapirLiu: OH: docker is like a condom to protect your computer from Node.
    • @DigitCurator: "The Next Decade In Storage": Resistive RAM promises better scaling, efficiency, and 1000x endurance of flash memory 
    • @BenedictEvans: At the end of 2014 Apple had ~650-675m live iOS devices. With zero unit sales growth, 700-720m by end 2015. Consumer PCs in use - 7-800m
    • @MailChimp: We sent 14.1 billion emails in December, including 741 million on Cyber Monday.
    • @mjpt777:  That's in the past. We can now do 20 million per second :-) per stream.
    • @bradwilsonConclusions: 1. Ethernet over power does not perform as well as WiFi (??) 2. Ethernet over power hates being shared among multiple PCs
    • @mjpt777: Specialized Evolution of the General-Purpose CPU  - note that performance per watt is approx doubling per generation. 
    • @nighitingale: "The Earth is 4.6 billion years old. Scaling to 46 years, humans have been here 4 hours, the industrial..."
    • Joseph Campbell: The hero’s journey always begins with the call. One way or another, a guide must come to say, “Look, you’re in Sleepy Land. Wake. Come on a trip."
    • Frank Herbert: the most persistent principles of the universe were accident and error.

  • Will Facebook ever figure out this mobile thing? Not long ago that was the big question. We have an answer. In the fourth quarter, the percentage of its advertising revenue from mobile devices increased to 69%, up from 66% in the third quarter and 53% a year earlier. Mobile daily active users were 745 million on average for December 2014, an increase of 34 percent year-over-year.

  • The power of smart: Facebook’s Powerful Ad Tools Grew Its Revenue 25X Faster Than User Count. Facebook might be running out of people, but they aren't running out of ways of monetizing those people. Math grows faster than users.

  • The Cathedral of Computation by Ian Bogost. Agree in part. There does seem to be an uncritical acceptance of algorithms, as if because they enliven machines they are some how pure and objective, when the opposite is the case. Algorithms are made for human purposes by teams of humans and show the biases and hubris of their makers. And like all creatures, algorithms should be subject to skepticism, law, and review.

  • We have many long running debates in tech. Server side vs client side rendering is just one of them. A thoughtful analysis: Tradeoffs in server side and client side rendering by Malte Ubl.  Bret Slatkin boldly claims: Experimentally verified: "Why client-side templating is wrong". He concludes: I hope never to render anything server-side ever again. I feel more comfortable in making that choice than ever thanks to all this data. I see rare occasions when server-side rendering could make sense for performance, but I don't expect to encounter many of those situations in the future.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Jan282015

Instagram Strategy to Radically Reduce Traffic: Kill all the spambots!

RIP to my fallen robot followers on Instagram, if there's a heaven for robot instagram users, you guys are in there

— alldaychubbyboy (@Allday)

How do you scale to handle increased user traffic? Have less traffic. No, this is not a koan. The best way to deal with traffic is not to have it. 

In a two day span Instagram disappeared 18.9 million users or more than 29 percent of their "followers." Justin Bieber lost 3.5 million followers (15 percent), Kim Kardashian lost 1.3 million followers (5.5 percent), Rihanna lost 1.2 million followers.

Instagram explains this dramatic reckoning was achieved by "removing deactivated spam accounts and accounts that violated its community guidelines." 

In an age when high user counts and tantalizing engagement metrics are more valuable than bitcoins, this can't have been an easy decision, but it was made after being bought by Facebook.

Why? Gabe Madway, an Instagram spokesman, tells us why: We totally get that it’s uncomfortable for people. The overall goal is we want it to be perceived that the people following you are real.

Uncomfortable is an understatement. A BuzzFeed article nicely captured some of the anger, here's just one example (could be NSFW):

Click to read more ...

Monday
Jan262015

Paper: Immutability Changes Everything by Pat Helland

I was excited to see that Pat Helland has published another thought provoking paper: Immutability Changes Everything. If video is more your style, Pat gave a wonderful talk on the same subject at RICON2012 (videoslides).

It's fun to see how Pat's thinking is evolving over time as he's worked at Tandem Computers (TransactionMonitoring Facility), Amazon, Microsoft (Microsoft Transaction Server and SQL Service Broker), and now Salesforce.

You might have enjoyed some of Pat's other visionary papers: Life beyond Distributed Transactions: an Apostate’s OpinionThe end of an architectural era: (it's time for a complete rewrite), and Idempotence Is Not a Medical Condition.

This new paper is a high level overview of why immutability, the idea that destructive updates are not allowed, is a huge architectural win and because of cheaper disk, RAM, and compute, it's now financially feasible to keep all the things. The key insight is that without data updates, coordination in a distributed system becomes a much simpler problem to solve.

Immutability is an architectural concept that's been gaining steam on several fronts. Facebook is using a declarative immutable programming model in both the model and the view. We are seeing the idea of immutable infrastructure rise in DevOps. Aeron is a new messaging system that uses a persistent log to good advantage. The Lambda Architecture makes use of immutability. Datomic is a database data that treats data as a time-ordered series of immutable objects.

If that's of interest, then you'll like the paper.

Overview:

Click to read more ...

Friday
Jan232015

Stuff The Internet Says On Scalability For January 23rd, 2015

Hey, it's HighScalability time:


Elon Musk: The universe is really, really big  [Gigapixels of Andromeda [4K]]
  • 90: is the new 50 for woman designer; $656.8 million: 3 months of Uber payouts; $10 billion: all it takes to build the Internet in space; 1 billion: registered WeChat users
  • Quotable Quotes:
    • @antirez: Tech stacks, more replaceable than ever: hardware is better, startups get $$ (few nodes + or - who cares), alternatives countless.
    • Olivio Sarikas: If every Star in this Image was a 2 millimeter Sandcorn you would end up with 1110 kg of Sand!!!!!!!!!
    • Chad Cipoletti: In even simpler terms, we see brands as people.
    • @timoreilly: Love it: “We need a stack, not a pile” says @michalmigurski.
    • @neha: I would be very happy to never again see a distributed systems paper eval on a workload that would fit on one machine.
    • @etherealmind: OH: "oh yeah, the extra 4 PB of storage is being installed today. Its about 4 racks of gear".
    • @lintool: Andrew Moore: Google's ecommerce platform ingests 100K-200K events per second continuously. 

  • Programming as myth building. Myths to Live By: The true symbol does not merely point to something else. It contains in itself a structure which awakens our consciousness to a new awareness of the inner meaning of life and of reality itself. A true symbol takes us to the center of the circle, not to another point on the circumference.

  • Not shocking at all: "We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code...A majority (77%) of the failures require more than one input event to manifest, but most of the failures(90%) require no more than 3." Really, who has the time? More on human nature in Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems.

  • Let simplicity fail before climbing the complexity ladder. Scalability! But at what COST?: "Big data systems may scale well, but this can often be just because they introduce a lot of overhead. Rather than making your computation go faster, the systems introduce substantial overheads which can require large compute clusters just to bring under control. In many cases, you’d be better off running the same computation on your laptop." But notice the kicker: "it took some work for parallel union-find." Replacing smart work with brute force is often the greater win. What are a few machine cycles between friends?

  • Programming is the ultimate team sport, so Why are Some Teams Smarter Than Others? The smartest teams were distinguished by three characteristics. First, their members contributed more equally to the team’s discussions. Second, their members can better read complex emotional states. Third, teams with more women outperformed teams with more men.

  • WhatsApp doesn't understand the web. Interesting design and discussions. Using proprietary Chrome APIs is a tough call, but this is more perplexing: "Your phone needs to stay connected to the internet for our web client to work." Is this for consistency reasons? To make sure the phone and the web stay in sync? Is it for monetization reasons? It does create a closed proxy that effectively prevents monetization leaks. It's tough to judge a solution without understanding the requirements, but there must be something compelling to impose so many limitations.

  • Roman Leventov analysis of Redis data structures. In which Salvatore 'antirez' Sanfilippo addresses point by point criticisms of Redis' implementation. People love Redis, part of that love has to come from what a good guy antirez is. Here he doesn't go all black diamond alpha nerd in the face of a challenge. He admits where things can be improved. He explains design decisions in detail. He advances the discussion with grace, humility, and smarts. A worthy model to emulate.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Thursday
Jan222015

As a DBA Expert, which database would you choose?

This is a guest post by Jenny Richards, a professional database administrator who is currently employed at Remote DBA.

In the world of databases, there is no single silver bullet fitting for every gun. How you select the database to use is very dependent on every other factor of your work: 

  • Who are you and what do you do? 
  • What is your end goal – what are you working to achieve?
  • How much data do you intend to store?
  • On what language and OS platforms do your applications run?
  • What is your budget?
  • Will you also require data warehousing, decision support systems and/or BI?

Background information

Click to read more ...

Wednesday
Jan212015

Learn from my pain - 5 Lessons from Ello's Adventures in Rapid Scaling 

Within one week Ello went from thousands of sessions a day to a few million sessions a day. Mike Pack wrote a great article sharing what they’ve learned: 5 Early Lessons from Rapid, High Availability Scaling with Rails.

Some of their scaling challenges: quantity of data, team size, DNS, bot prevention, responding to users, inappropriate content, and other forms of caching. What did they learn?

  1. Move the graph. User relationships were implemented on a standard Rails stack using Heroku and Postgres. The relationships table became the bottleneck. Solution: denormalize the social graph and move hot data into Redis. Redis is used for speed and Postgres is used for durability. Lesson: know the core pillar that supports your core offering and make it work.

  2. Create indexes early, or you're screwed. There's a camp that says only create indexes when they are needed. They are wrong. The lack of btree indexes kills query performance. Forget a unique index and your data becomes corrupted. Once the damage is done it's hard to add unique indexes later. The data has to be cleaned up and indexes take a long time to build when there's a lot of data.

  3. Sharding is cool, but not that cool. Shard all the things only after you've tried vertically scaling as much as possible. Sharding caused a lot of pain. Creating a covering index from the start and adding more RAM so data could be served from memory, not from disk, would have saved a lot of time and stress as the system scaled.

  4. Don't create bottlenecks, or do. Every new user automatically followed a system user that was used for announcements, etc. Scaling problems that would have been months down the road hit quickly as any write to the system user caused a write amplification of millions of records. The lesson here is not what you may think. While scaling to meet the challenge of the system user was a pain, it made them stay ahead of the scaling challenge. Lesson: self-inflict problems early and often.

  5. It always takes 10 times longer. All the solutions mentioned take much longer to implement than you might think. Early estimates of a couple days soon give way to the reality of much longer time hits. Simply moving large amounts of data can take days. Adding indexes to large amounts of data takes time. And with large amounts of data problems tend to happen as you get to the larger data sizes which means you need to apply a fix and start over. 

This full article is excellent and is filled with much more detail that makes it well worth reading.

Tuesday
Jan202015

Sponsored Post: Couchbase, VividCortex, Internap, SocialRadar, Campanja, Transversal, MemSQL, Hypertable, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Senior DevOps EngineerSocialRadar. We are a VC funded startup based in Washington, D.C. operated like our West Coast brethren. We specialize in location-based technology. Since we are rapidly consuming large amounts of location data and monitoring all social networks for location events, we have systems that consume vast amounts of data that need to scale. As our Senior DevOps Engineer you’ll take ownership over that infrastructure and, with your expertise, help us grow and scale both our systems and our team as our adoption continues its rapid growth. Full description and application here.

  • Linux Web Server Systems EngineerTransversal. We are seeking an experienced and motivated Linux System Engineer to join our Engineering team. This new role is to design, test, install, and provide ongoing daily support of our information technology systems infrastructure. As an experienced Engineer you will have comprehensive capabilities for understanding hardware/software configurations that comprise system, security, and library management, backup/recovery, operating computer systems in different operating environments, sizing, performance tuning, hardware/software troubleshooting and resource allocation. Apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/

Cool Products and Services

  • See How PayPal Manages 1B Documents & 10TB Data with Couchbase. This presentation showcases PayPal's usage of Couchbase within its architecture, highlighting Linear scalability, Availability, Flexibility & Extensibility. See How PayPal Manages 1B Documents & 10TB Data with Couchbase.

  • VividCortex is a hosted (SaaS) database performance management platform that provides unparalleled insight and query-level analysis for both MySQL and PostgreSQL servers at micro-second detail. It's not just another tool to draw time-series charts from status counters. It's deep analysis of every metric, every process, and every query on your systems, stitched together with statistics and data visualization. Start a free trial today with our famous 15-second installation.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Aerospike demonstrates RAM-like performance with Google Compute Engine Local SSDs.
    After scaling to 1 M Writes/Second with 6x fewer servers than Cassandra on Google Compute Engine, we certified Google’s new Local SSDs using the Aerospike Certification Tool for SSDs (ACT) and found RAM-like performance and 15x storage cost savings. Read more.

  • FoundationDB 3.0. 3.0 makes the power of a multi-model, ACID transactional database available to a set of new connected device apps that are generating data at previously unheard of speed. It is the fastest, most scalable, transactional database in the cloud - A 32 machine cluster running on Amazon EC2 sustained more than 14M random operations per second.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free! (See how Scalyr is different if you're looking for a Splunk alternative.)

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Friday
Jan162015

Stuff The Internet Says On Scalability For January 16th, 2015

Hey, it's HighScalability time:


First people to free-climb the Dawn Wall of El Capitan using nothing but stone knives and bearskins (pics). 
  • $3.3 trillion: mobile revenue in 2014; ~10%: the difference between a good SpaceX landing and a crash; 6: hours for which quantum memory was held stable 
  • Quotable Quotes:
    • @stevesi: "'If you had bought the computing power found inside an iPhone 5S in 1991, it would have cost you $3.56 million.'"
    • @imgurAPI: Where do you buy shares in data structures? The Stack Exchange
    • @postwait@xaprb agreed. @circonus does per-second monitoring, but *retain* one minute for 7 years; that plus histograms provides magical insight.
    • @iamaaronheld: A single @awscloud datacenter consumes enough electricity to send 24 DeLoreans back in time
    • @rstraub46: "We are becoming aware that the major questions regarding technology are not technical but human questions" - Peter Drucker, 1967
    • @Noahpinion: Behavioral economics IS the economics of information. via @CFCamerer 
    • @sheeshee: "decentralize all the things" (guess what everybody did in the early 90ies & why we happily flocked to "services". ;)
    • New Clues: The Internet is no-thing at all. At its base the Internet is a set of agreements, which the geeky among us (long may their names be hallowed) call "protocols," but which we might, in the temper of the day, call "commandments."

  • Can't agree with this. We Suck at HTTP. HTTP is just a transport. It should only deliver transport related error codes. Application errors belong in application messages, not spread all over the stack. 

  • Apple has lost the functional high ground. It's funny how microservices are hot and one of its wins is the independent evolution of services. Apple's software releases now make everything tied together. It's a strategy tax. The watch just extends the rigidity of the structure. But this is a huge upgrade. Apple is moving to a cloud multi-device sync model, which is a complete revolution. It will take a while for all this to shake out. 

  • This is so cool, I've never heard of Cornelis Drebbel (1620s) before or about his amazing accomplishments. The Vulgar Mechanic and His Magical Oven: His oven is one of the earliest devices that gave human control away to a machine and thus can be seen as a forerunner of the smart machine, the self-deciding automaton, the thinking robot.

  • Do you think there's a DevOps identity crisis, as Baron Schwartz suggests? Does DevOps have a messaging and positioning problem? Is DevOps just old wine in a new skin? Is DevOps made up of echo chambers? I don't know, but an interesting analysis by Baron.

  • How does Hyper-threading double your CPU throughput?: So if you are optimizing for higher throughput – that may be fine. But if you are optimizing for response time, then you may consider running with HT turned off.

  • Underdog.io share's what's Inside Datadog’s Tech Stack: python, javascript and go; the front-end happen in D3 and React; databases are Kafka, redis, Cassandra, S3, ElasticSearch, PostgreSQL; DevOps is Chef, Capistrano, Jenkins, Hubot, and others.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Jan142015

StackExchange's Performance Dashboard

StackExchange created a very cool performance dashboard that looks to be updated from real system metrics. Wouldn't it be fascinating if every site had a similar dashboard?

The dashboard contains information like there are 560 million page views per month, 260,000 sustained connections,  34 TB data transferred per month, 9 web servers with 48GB of RAM handling 185 req/s at 15% CPU usage. There are 4 SQL servers, 2 redis servers, 3 tag engine servers, 3 elasticsearch servers, and 2 HAProxy servers, along with stats on each.

There's also an excellent discussion thread on reddit that goes into more interesting details, with questions being answered by folks from StackExchange. 

StackExchange is still doing innovative work and is very much an example worth learning from. They've always danced to their own tune and it's a catchy tune at that. More at StackOverflow Update: 560M Pageviews A Month, 25 Servers, And It's All About Performance.

Monday
Jan122015

The Stunning Scale of AWS and What it Means for the Future of the Cloud

James Hamilton, VP and Distinguished Engineer at Amazon, and long time blogger of interesting stuff, gave an enthusiastic talk at AWS re:Invent 2014 on AWS Innovation at Scale. He’s clearly proud of the work they are doing and it shows.

James shared a few eye popping stats about AWS:

  • 1 million active customers
  • All 14 other cloud providers combined have 1/5th the aggregate capacity of AWS (estimate by Gartner in 2013)
  • 449 new services and major features released in 2014
  • Every day, AWS adds enough new server capacity to support all of Amazon’s global infrastructure when it was a $7B annual revenue enterprise (in 2004).
  • S3 has 132% year-over-year growth in data transfer
  • 102Tbps network capacity into a datacenter.

The major theme of the talk is the cloud is a different world. It’s a special environment that allows AWS to do great things at scale, things you can’t do, which is why the transition from on premise x86 servers to the public cloud is happening at a blistering pace. With so many scale driven benefits to the public cloud, it's a transition that can't be stopped. The cloud will keep getting more reliable, more functional, and cheaper at a rate that you can't begin to match with your limited resources, generalist gear, bloated software stacks, slow supply chains, and outdated innovation paradigms.

That's the PR message at least. But one thing you can say about Amazon is they are living it. They are making it real. So a healthy doubt is healthy, but extrapolating out the lines of fate would also be wise.

One of the fickle finger of fate advantages AWS has is resources. At one million customers they have the scale to keep the engine of expansion and improvement going. Profits aren't being taken out, money is being reinvested. This is perhaps the most important advantage of scale.

But money without smarts is simply waste. Amazon wants you to know they have the smarts. We've heard how Google and Facebook build their own gear, Amazon does too. They build their own networking gear, networking software, racks, and they work with Intel to get faster processor versions of processors than are available on the market. The key is they know everything and control everything about their environment, so they can build simpler gear that does exactly what they want, which turns out to be cheaper and more reliable in the end.

Complete control allows quality metrics to be built into everything. Metrics drive a constant quality increase in all parts of the system, which is why against all odds AWS is getting more reliable as the pace of innovation quickens. Great pools of actionable data turned into knowledge is another huge advantage of scale.

Another thing AWS can do that you can't is the Availability Zone architecture itself. Each AZ is its own datacenter and AZs within a region are located very close together. This reduces messaging latencies, which means state can be synchronously replicated between AZs, which greatly improves availability compared to the typical approach where redundant datacenters are very far apart. 

It's a talk rich with information and...well, spunk. The real meta-theme of the talk is how Amazon consciously uses scale to their competitive advantage. For Amazon scale isn't just an expense to be dealt with, scale is a resource to exploit, if you know how.

Here's my gloss of James Hamilton's incredible talk...

Everything in the Talk has a Foundation in Scale

Click to read more ...