advertise
Wednesday
Jan282015

Instagram Strategy to Radically Reduce Traffic: Kill all the spambots!

RIP to my fallen robot followers on Instagram, if there's a heaven for robot instagram users, you guys are in there

— alldaychubbyboy (@Allday)

How do you scale to handle increased user traffic? Have less traffic. No, this is not a koan. The best way to deal with traffic is not to have it. 

In a two day span Instagram disappeared 18.9 million users or more than 29 percent of their "followers." Justin Bieber lost 3.5 million followers (15 percent), Kim Kardashian lost 1.3 million followers (5.5 percent), Rihanna lost 1.2 million followers.

Instagram explains this dramatic reckoning was achieved by "removing deactivated spam accounts and accounts that violated its community guidelines." 

In an age when high user counts and tantalizing engagement metrics are more valuable than bitcoins, this can't have been an easy decision, but it was made after being bought by Facebook.

Why? Gabe Madway, an Instagram spokesman, tells us why: We totally get that it’s uncomfortable for people. The overall goal is we want it to be perceived that the people following you are real.

Uncomfortable is an understatement. A BuzzFeed article nicely captured some of the anger, here's just one example (could be NSFW):

Click to read more ...

Monday
Jan262015

Paper: Immutability Changes Everything by Pat Helland

I was excited to see that Pat Helland has published another thought provoking paper: Immutability Changes Everything. If video is more your style, Pat gave a wonderful talk on the same subject at RICON2012 (videoslides).

It's fun to see how Pat's thinking is evolving over time as he's worked at Tandem Computers (TransactionMonitoring Facility), Amazon, Microsoft (Microsoft Transaction Server and SQL Service Broker), and now Salesforce.

You might have enjoyed some of Pat's other visionary papers: Life beyond Distributed Transactions: an Apostate’s OpinionThe end of an architectural era: (it's time for a complete rewrite), and Idempotence Is Not a Medical Condition.

This new paper is a high level overview of why immutability, the idea that destructive updates are not allowed, is a huge architectural win and because of cheaper disk, RAM, and compute, it's now financially feasible to keep all the things. The key insight is that without data updates, coordination in a distributed system becomes a much simpler problem to solve.

Immutability is an architectural concept that's been gaining steam on several fronts. Facebook is using a declarative immutable programming model in both the model and the view. We are seeing the idea of immutable infrastructure rise in DevOps. Aeron is a new messaging system that uses a persistent log to good advantage. The Lambda Architecture makes use of immutability. Datomic is a database data that treats data as a time-ordered series of immutable objects.

If that's of interest, then you'll like the paper.

Overview:

Click to read more ...

Friday
Jan232015

Stuff The Internet Says On Scalability For January 23rd, 2015

Hey, it's HighScalability time:


Elon Musk: The universe is really, really big  [Gigapixels of Andromeda [4K]]
  • 90: is the new 50 for woman designer; $656.8 million: 3 months of Uber payouts; $10 billion: all it takes to build the Internet in space; 1 billion: registered WeChat users
  • Quotable Quotes:
    • @antirez: Tech stacks, more replaceable than ever: hardware is better, startups get $$ (few nodes + or - who cares), alternatives countless.
    • Olivio Sarikas: If every Star in this Image was a 2 millimeter Sandcorn you would end up with 1110 kg of Sand!!!!!!!!!
    • Chad Cipoletti: In even simpler terms, we see brands as people.
    • @timoreilly: Love it: “We need a stack, not a pile” says @michalmigurski.
    • @neha: I would be very happy to never again see a distributed systems paper eval on a workload that would fit on one machine.
    • @etherealmind: OH: "oh yeah, the extra 4 PB of storage is being installed today. Its about 4 racks of gear".
    • @lintool: Andrew Moore: Google's ecommerce platform ingests 100K-200K events per second continuously. 

  • Programming as myth building. Myths to Live By: The true symbol does not merely point to something else. It contains in itself a structure which awakens our consciousness to a new awareness of the inner meaning of life and of reality itself. A true symbol takes us to the center of the circle, not to another point on the circumference.

  • Not shocking at all: "We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code...A majority (77%) of the failures require more than one input event to manifest, but most of the failures(90%) require no more than 3." Really, who has the time? More on human nature in Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems.

  • Let simplicity fail before climbing the complexity ladder. Scalability! But at what COST?: "Big data systems may scale well, but this can often be just because they introduce a lot of overhead. Rather than making your computation go faster, the systems introduce substantial overheads which can require large compute clusters just to bring under control. In many cases, you’d be better off running the same computation on your laptop." But notice the kicker: "it took some work for parallel union-find." Replacing smart work with brute force is often the greater win. What are a few machine cycles between friends?

  • Programming is the ultimate team sport, so Why are Some Teams Smarter Than Others? The smartest teams were distinguished by three characteristics. First, their members contributed more equally to the team’s discussions. Second, their members can better read complex emotional states. Third, teams with more women outperformed teams with more men.

  • WhatsApp doesn't understand the web. Interesting design and discussions. Using proprietary Chrome APIs is a tough call, but this is more perplexing: "Your phone needs to stay connected to the internet for our web client to work." Is this for consistency reasons? To make sure the phone and the web stay in sync? Is it for monetization reasons? It does create a closed proxy that effectively prevents monetization leaks. It's tough to judge a solution without understanding the requirements, but there must be something compelling to impose so many limitations.

  • Roman Leventov analysis of Redis data structures. In which Salvatore 'antirez' Sanfilippo addresses point by point criticisms of Redis' implementation. People love Redis, part of that love has to come from what a good guy antirez is. Here he doesn't go all black diamond alpha nerd in the face of a challenge. He admits where things can be improved. He explains design decisions in detail. He advances the discussion with grace, humility, and smarts. A worthy model to emulate.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Thursday
Jan222015

As a DBA Expert, which database would you choose?

This is a guest post by Jenny Richards, a professional database administrator who is currently employed at Remote DBA.

In the world of databases, there is no single silver bullet fitting for every gun. How you select the database to use is very dependent on every other factor of your work: 

  • Who are you and what do you do? 
  • What is your end goal – what are you working to achieve?
  • How much data do you intend to store?
  • On what language and OS platforms do your applications run?
  • What is your budget?
  • Will you also require data warehousing, decision support systems and/or BI?

Background information

Click to read more ...

Wednesday
Jan212015

Learn from my pain - 5 Lessons from Ello's Adventures in Rapid Scaling 

Within one week Ello went from thousands of sessions a day to a few million sessions a day. Mike Pack wrote a great article sharing what they’ve learned: 5 Early Lessons from Rapid, High Availability Scaling with Rails.

Some of their scaling challenges: quantity of data, team size, DNS, bot prevention, responding to users, inappropriate content, and other forms of caching. What did they learn?

  1. Move the graph. User relationships were implemented on a standard Rails stack using Heroku and Postgres. The relationships table became the bottleneck. Solution: denormalize the social graph and move hot data into Redis. Redis is used for speed and Postgres is used for durability. Lesson: know the core pillar that supports your core offering and make it work.

  2. Create indexes early, or you're screwed. There's a camp that says only create indexes when they are needed. They are wrong. The lack of btree indexes kills query performance. Forget a unique index and your data becomes corrupted. Once the damage is done it's hard to add unique indexes later. The data has to be cleaned up and indexes take a long time to build when there's a lot of data.

  3. Sharding is cool, but not that cool. Shard all the things only after you've tried vertically scaling as much as possible. Sharding caused a lot of pain. Creating a covering index from the start and adding more RAM so data could be served from memory, not from disk, would have saved a lot of time and stress as the system scaled.

  4. Don't create bottlenecks, or do. Every new user automatically followed a system user that was used for announcements, etc. Scaling problems that would have been months down the road hit quickly as any write to the system user caused a write amplification of millions of records. The lesson here is not what you may think. While scaling to meet the challenge of the system user was a pain, it made them stay ahead of the scaling challenge. Lesson: self-inflict problems early and often.

  5. It always takes 10 times longer. All the solutions mentioned take much longer to implement than you might think. Early estimates of a couple days soon give way to the reality of much longer time hits. Simply moving large amounts of data can take days. Adding indexes to large amounts of data takes time. And with large amounts of data problems tend to happen as you get to the larger data sizes which means you need to apply a fix and start over. 

This full article is excellent and is filled with much more detail that makes it well worth reading.

Tuesday
Jan202015

Sponsored Post: Couchbase, VividCortex, Internap, SocialRadar, Campanja, Transversal, MemSQL, Hypertable, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Senior DevOps EngineerSocialRadar. We are a VC funded startup based in Washington, D.C. operated like our West Coast brethren. We specialize in location-based technology. Since we are rapidly consuming large amounts of location data and monitoring all social networks for location events, we have systems that consume vast amounts of data that need to scale. As our Senior DevOps Engineer you’ll take ownership over that infrastructure and, with your expertise, help us grow and scale both our systems and our team as our adoption continues its rapid growth. Full description and application here.

  • Linux Web Server Systems EngineerTransversal. We are seeking an experienced and motivated Linux System Engineer to join our Engineering team. This new role is to design, test, install, and provide ongoing daily support of our information technology systems infrastructure. As an experienced Engineer you will have comprehensive capabilities for understanding hardware/software configurations that comprise system, security, and library management, backup/recovery, operating computer systems in different operating environments, sizing, performance tuning, hardware/software troubleshooting and resource allocation. Apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/

Cool Products and Services

  • See How PayPal Manages 1B Documents & 10TB Data with Couchbase. This presentation showcases PayPal's usage of Couchbase within its architecture, highlighting Linear scalability, Availability, Flexibility & Extensibility. See How PayPal Manages 1B Documents & 10TB Data with Couchbase.

  • VividCortex is a hosted (SaaS) database performance management platform that provides unparalleled insight and query-level analysis for both MySQL and PostgreSQL servers at micro-second detail. It's not just another tool to draw time-series charts from status counters. It's deep analysis of every metric, every process, and every query on your systems, stitched together with statistics and data visualization. Start a free trial today with our famous 15-second installation.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Aerospike demonstrates RAM-like performance with Google Compute Engine Local SSDs.
    After scaling to 1 M Writes/Second with 6x fewer servers than Cassandra on Google Compute Engine, we certified Google’s new Local SSDs using the Aerospike Certification Tool for SSDs (ACT) and found RAM-like performance and 15x storage cost savings. Read more.

  • FoundationDB 3.0. 3.0 makes the power of a multi-model, ACID transactional database available to a set of new connected device apps that are generating data at previously unheard of speed. It is the fastest, most scalable, transactional database in the cloud - A 32 machine cluster running on Amazon EC2 sustained more than 14M random operations per second.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free! (See how Scalyr is different if you're looking for a Splunk alternative.)

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Friday
Jan162015

Stuff The Internet Says On Scalability For January 16th, 2015

Hey, it's HighScalability time:


First people to free-climb the Dawn Wall of El Capitan using nothing but stone knives and bearskins (pics). 
  • $3.3 trillion: mobile revenue in 2014; ~10%: the difference between a good SpaceX landing and a crash; 6: hours for which quantum memory was held stable 
  • Quotable Quotes:
    • @stevesi: "'If you had bought the computing power found inside an iPhone 5S in 1991, it would have cost you $3.56 million.'"
    • @imgurAPI: Where do you buy shares in data structures? The Stack Exchange
    • @postwait@xaprb agreed. @circonus does per-second monitoring, but *retain* one minute for 7 years; that plus histograms provides magical insight.
    • @iamaaronheld: A single @awscloud datacenter consumes enough electricity to send 24 DeLoreans back in time
    • @rstraub46: "We are becoming aware that the major questions regarding technology are not technical but human questions" - Peter Drucker, 1967
    • @Noahpinion: Behavioral economics IS the economics of information. via @CFCamerer 
    • @sheeshee: "decentralize all the things" (guess what everybody did in the early 90ies & why we happily flocked to "services". ;)
    • New Clues: The Internet is no-thing at all. At its base the Internet is a set of agreements, which the geeky among us (long may their names be hallowed) call "protocols," but which we might, in the temper of the day, call "commandments."

  • Can't agree with this. We Suck at HTTP. HTTP is just a transport. It should only deliver transport related error codes. Application errors belong in application messages, not spread all over the stack. 

  • Apple has lost the functional high ground. It's funny how microservices are hot and one of its wins is the independent evolution of services. Apple's software releases now make everything tied together. It's a strategy tax. The watch just extends the rigidity of the structure. But this is a huge upgrade. Apple is moving to a cloud multi-device sync model, which is a complete revolution. It will take a while for all this to shake out. 

  • This is so cool, I've never heard of Cornelis Drebbel (1620s) before or about his amazing accomplishments. The Vulgar Mechanic and His Magical Oven: His oven is one of the earliest devices that gave human control away to a machine and thus can be seen as a forerunner of the smart machine, the self-deciding automaton, the thinking robot.

  • Do you think there's a DevOps identity crisis, as Baron Schwartz suggests? Does DevOps have a messaging and positioning problem? Is DevOps just old wine in a new skin? Is DevOps made up of echo chambers? I don't know, but an interesting analysis by Baron.

  • How does Hyper-threading double your CPU throughput?: So if you are optimizing for higher throughput – that may be fine. But if you are optimizing for response time, then you may consider running with HT turned off.

  • Underdog.io share's what's Inside Datadog’s Tech Stack: python, javascript and go; the front-end happen in D3 and React; databases are Kafka, redis, Cassandra, S3, ElasticSearch, PostgreSQL; DevOps is Chef, Capistrano, Jenkins, Hubot, and others.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Jan142015

StackExchange's Performance Dashboard

StackExchange created a very cool performance dashboard that looks to be updated from real system metrics. Wouldn't it be fascinating if every site had a similar dashboard?

The dashboard contains information like there are 560 million page views per month, 260,000 sustained connections,  34 TB data transferred per month, 9 web servers with 48GB of RAM handling 185 req/s at 15% CPU usage. There are 4 SQL servers, 2 redis servers, 3 tag engine servers, 3 elasticsearch servers, and 2 HAProxy servers, along with stats on each.

There's also an excellent discussion thread on reddit that goes into more interesting details, with questions being answered by folks from StackExchange. 

StackExchange is still doing innovative work and is very much an example worth learning from. They've always danced to their own tune and it's a catchy tune at that. More at StackOverflow Update: 560M Pageviews A Month, 25 Servers, And It's All About Performance.

Monday
Jan122015

The Stunning Scale of AWS and What it Means for the Future of the Cloud

James Hamilton, VP and Distinguished Engineer at Amazon, and long time blogger of interesting stuff, gave an enthusiastic talk at AWS re:Invent 2014 on AWS Innovation at Scale. He’s clearly proud of the work they are doing and it shows.

James shared a few eye popping stats about AWS:

  • 1 million active customers
  • All 14 other cloud providers combined have 1/5th the aggregate capacity of AWS (estimate by Gartner in 2013)
  • 449 new services and major features released in 2014
  • Every day, AWS adds enough new server capacity to support all of Amazon’s global infrastructure when it was a $7B annual revenue enterprise (in 2004).
  • S3 has 132% year-over-year growth in data transfer
  • 102Tbps network capacity into a datacenter.

The major theme of the talk is the cloud is a different world. It’s a special environment that allows AWS to do great things at scale, things you can’t do, which is why the transition from on premise x86 servers to the public cloud is happening at a blistering pace. With so many scale driven benefits to the public cloud, it's a transition that can't be stopped. The cloud will keep getting more reliable, more functional, and cheaper at a rate that you can't begin to match with your limited resources, generalist gear, bloated software stacks, slow supply chains, and outdated innovation paradigms.

That's the PR message at least. But one thing you can say about Amazon is they are living it. They are making it real. So a healthy doubt is healthy, but extrapolating out the lines of fate would also be wise.

One of the fickle finger of fate advantages AWS has is resources. At one million customers they have the scale to keep the engine of expansion and improvement going. Profits aren't being taken out, money is being reinvested. This is perhaps the most important advantage of scale.

But money without smarts is simply waste. Amazon wants you to know they have the smarts. We've heard how Google and Facebook build their own gear, Amazon does too. They build their own networking gear, networking software, racks, and they work with Intel to get faster processor versions of processors than are available on the market. The key is they know everything and control everything about their environment, so they can build simpler gear that does exactly what they want, which turns out to be cheaper and more reliable in the end.

Complete control allows quality metrics to be built into everything. Metrics drive a constant quality increase in all parts of the system, which is why against all odds AWS is getting more reliable as the pace of innovation quickens. Great pools of actionable data turned into knowledge is another huge advantage of scale.

Another thing AWS can do that you can't is the Availability Zone architecture itself. Each AZ is its own datacenter and AZs within a region are located very close together. This reduces messaging latencies, which means state can be synchronously replicated between AZs, which greatly improves availability compared to the typical approach where redundant datacenters are very far apart. 

It's a talk rich with information and...well, spunk. The real meta-theme of the talk is how Amazon consciously uses scale to their competitive advantage. For Amazon scale isn't just an expense to be dealt with, scale is a resource to exploit, if you know how.

Here's my gloss of James Hamilton's incredible talk...

Everything in the Talk has a Foundation in Scale

Click to read more ...

Friday
Jan092015

Stuff The Internet Says On Scalability For January 9th, 2015

Hey, it's HighScalability time:


UFOs or Floating Solar Balloon power stations? You decide.

 

  • 700 Million: WhatsApp active monthly users; 17 million: comments on Stack Exchange in 2014
  • Quotable Quotes
    • John von Neumann: It is easier to write a new code than to understand an old one.
    • @BenedictEvans: Gross revenue on Apple & Google's app stores was a little over $20bn in 2014. Bigger than recorded music, FWIW.
    • Julian Bigelow: Absence of a signal should never be used as a signal. 
    • Bigelow ~ separate signal from noise at every stage of the process—in this case, at the transfer of every single bit—rather than allowing noise to accumulate along the way
    • cgb_: One of the things I've found interesting about rapidly popular opensource solutions in the last 1-2 years is how quickly venture cap funding comes in and drives the direction of future development.
    • @miostaffin: "If Amazon wants to test 5,000 users to use a feature, they just need to turn it on for 45 seconds." -@jmspool #uxdc
    • Roberta Ness: Amazing possibility on the one hand and frustrating inaction on the other—that is the yin and yang of modern science. Invention generates ever more gizmos and gadgets, but imagination is not providing clues to solving the scientific puzzles that threaten our very existence.

  • Can HTTPS really be faster than HTTP? Yes, it can. Take the test for yourself. The secret: SPDY. More at Why we don’t use a CDN: A story about SPDY and SSL

  • A fascinating and well told tale of the unexpected at Facebook. Solving the Mystery of Link Imbalance: A Metastable Failure State at Scale: The most literal conclusion to draw from this story is that MRU connection pools shouldn’t be used for connections that traverse aggregated links. At a meta-level, the next time you are debugging emergent behavior, you might try thinking of the components as agents colluding via covert channels. At an organizational level, this investigation is a great example of why we say that nothing at Facebook is somebody else’s problem.

  • Everything old is new again. Facebook on disaggregation vs. hyperconvergence: Just when everyone agreed that scale-out infrastructure with commodity nodes of tightly-coupled CPU, memory and storage is the way to go, Facebook’s Jeff Qin, a capacity management engineer – in a talk at Storage Visions 2015 – offers an opposing vision: disaggregated racks. One rack for computes, another for memory and a third – and fourth – for storage.

  • Why Instagram Worked. Instagram was the result of a pivot away from a not popular enough social networking site to a stripped down app that allowed people to document their world in pictures. Though the source article is short on the why, there's a good discussion on Hacker News. Some interesting reasons: Instagram worked because it algorithmically hides flaws in photographs so everyone's pictures look "good"; Snapping a photo is easy and revolves around a moment -- something easier to recognize when it's worthy of sharing; Startups need lucky breaks, but connections with the right people increase the odds considerably; Instagram worked because it was at the right place at the right time; It worked because it's a simple, quick, ultra-low friction way of sharing photos.

  • Atheists, it's not what you think. The God Login. The incomparable Jeff Atwood does a deep dive on the design of a common everyday object: the Login page. The title was inspired by one of Jeff's teacher's who asked what was the "God Algorithm" for a problem, that is, if God solved a problem what would the solution look like? While you may not agree with the proposed solution to the Login page problem, you may at least come away believing that one may or may not exist.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...