advertise
Wednesday
Feb112015

Rescuing an Outsourced Project from Collapse: 8 Problems Found and 8 Lessons Learned

If you are one of those people that think most of the products featured on HighScalability use way too many servers then you'll love this story: 130 VMs serving less than 10,000 users daily were chopped down to just one machine.

Here's the setup. A smallish website was having problems. Users were unhappy. In the balance was not only the product, but the company. The site was built using Angular, Symfony2, Postgres, Redis, Centos, 8 HP blades with 128 G RAM each, two racks, a very large HP 3par storage array, a 1Gbps uplink, and VMWare.

More than enough power for the task at hand. Yet the system couldn't handle the load. What would you do?

That's the story Jacques Mattheij tells in his very entertaining and educational Saving a Project and a Company article.

Jacques says much was right about the website, but time pressure and mismanagement created big problems at the system level. "A single clueless person in a position of trust with non technical management, an outsourced project and a huge budget, what could possibly go wrong?" Sound familiar? 

Problem 1: Virtualization Gone Crazy

Click to read more ...

Monday
Feb092015

Vinted Architecture: Keeping a busy portal stable by deploying several hundred times per day

This is guest post by Nerijus Bendžiūnas and Tomas Varaneckas of Vinted.

Vinted is a peer-to-peer marketplace to sell, buy and swap clothes. It allows members to communicate directly and has the features of a social networking service.

Started in 2008 as a small community for Lithuanian girls, it developed into a worldwide project that serves over 7 million users in 8 different countries, and is growing non-stop, handling over 200 M requests per day.

Stats

Click to read more ...

Friday
Feb062015

Stuff The Internet Says On Scalability For February 6th, 2015

Hey, it's HighScalability time:


What a beautiful example of Moore's law visualized through the evolution of Lara Croft! (from @silenok)
  • $1 million: per day gross of Clash of Clans
  • Quotable Quotes:
    • @dancowIn 45 minutes, the largest trader in U.S. equities went bankrupt because of bad devops
    • @bmdhacks: How to be a 10x engineer: Incur technical debt fast enough to appear 10x as productive as the ten engineers tasked with cleaning it up.
    • @CompSciFact: Scaling poorly: Performance degrades with problem size
      Poorly scaled: Things change far more rapidly in one direction than others
    • @mikiobraun: Before scaling out, a machine learning person would always try some approximation shortcut to achieve speed up. #cheating #orisit
    • @cshirky: 3/4 If your organization has ever made a significant and unpleasant change based on something you measured, you can probably use more data.
    • @PatrickMcFadin: Service Discovery Overview: ZooKeeper vs. Consul vs. Etcd vs. Eureka 
    • @jaykreps: TIL: Dequeuing a single item in RabbitMQ requires traversing every single item in the queue. Oh my.
    • @Carnage4Life: No single recipe 4 success. Great companies had bad habits; Apple micromanagement, Google random side projects & Facebook used fricking PHP
    • Stubbornly Persistent: although life would persist in the absence of microbes, both the quantity and quality of life would be reduced drastically.

  • At inflection points change the world must. Netflix: In the early days of Netflix streaming, circa 2008, we manually tracked hundreds of metrics, relying on humans to detect problems.  Our approach worked for tens of servers and thousands of devices, but not for the thousands of servers and millions of devices that were in our future.  Complexity and human-reliant approaches don’t scale; simplicity and algorithm-driven approaches do.

  • IBM is turning Watson into a platform, offering 5 new services: Speech to Text, Text to Speech, Visual Recognition, Concept Insights, Tradeoff Analytics. GA probable next month. Good discussion on Hacker News. Most of the services allow for training through feedback. Some question the quality of the services, but it's early days. Pricing is not set. Hopefully it won't suffer from what these next gen deep learning services tend to suffer from: expensivitis. Who can afford $1.00 per 1000 API calls for a mobile app that needs to acquire users? IBM, make it cheap, try for ubiquity. Cool stuff will happen.

  • Looking for that next step in distributed reliability? Look at TLA+. Murat has several articles on TLA+ and is using it his teaching distributed systems class. Oh, TLA stands for Temporal Logic of Actions. Leslie Lamport has many papers on TLA. James Hamilton wrote up their experiences at Amazon using TLA+: Challenges in Designing at Scale: Formal Methods in Building Robust Distributed Systems: TLA+, a formal specification language invented by ACM Turing award winner, Leslie Lamport. TLA+ is based on simple discrete math, basic set theory and predicates with which all engineers are quite familiar. A TLA+ specification simply describes the set of all possible legal behaviors (execution traces) of a system. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Feb042015

Matt Cutts: 10 Lessons Learned from the Early Days of Google


I mainly know of Matt Cutts, long time Google employee (since 2000) and currently head of Google's Webspam team, from his appearances on TwiT with Leo Laporte. On TwiT Matt always comes off as smart, thoughtful, and a really nice guy. This you might expect.

What I didn’t expect is in this talk he gave, Lessons learned from the early days of Google, is that Matt also turns out to be quite funny and a good story teller. The stories he’s telling are about Matt’s early days at Google. He puts a very human face on Google. When you think everything Google does is a calculation made by some behind the scenes AI, Matt reminds us that it’s humans making these decisions and they generally just do the best they can.

The primary theme of the talk is innovation and problem solving through creativity. When you are caught between a rock and a hard place you need to get creative. Question your assumptions. Maybe there’s a creative way to solve your problem?

The talk is short and well worth watching. There are lots of those fun little details that only someone with experience and perspective can give. And there’s lots of wisdom here too. Here’s my gloss on Matt’s talk:

1. Sometimes creativity makes a big difference.

Click to read more ...

Tuesday
Feb032015

Sponsored Post: Apple, Couchbase, Farmerswife, VividCortex, Internap, SocialRadar, Campanja, Transversal, MemSQL, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Apple is hiring a Application Security Engineer. Apple’s Gift Card Engineering group is looking for a software engineer passionate about application security for web applications and REST services. Be part of a team working on challenging and fast paced projects supporting Apple's business by delivering high volume, high performance, and high availability distributed transaction processing systems. Please apply here.

  • Want to be the leader and manager of a cutting-edge cloud deployment? Take charge of an innovative 24x7 web service infrastructure on the AWS Cloud? Join farmerswife on the beautiful island of Mallorca and help create the next generation on project management tools. Please apply here.

  • Senior DevOps EngineerSocialRadar. We are a VC funded startup based in Washington, D.C. operated like our West Coast brethren. We specialize in location-based technology. Since we are rapidly consuming large amounts of location data and monitoring all social networks for location events, we have systems that consume vast amounts of data that need to scale. As our Senior DevOps Engineer you’ll take ownership over that infrastructure and, with your expertise, help us grow and scale both our systems and our team as our adoption continues its rapid growth. Full description and application here.

  • Linux Web Server Systems EngineerTransversal. We are seeking an experienced and motivated Linux System Engineer to join our Engineering team. This new role is to design, test, install, and provide ongoing daily support of our information technology systems infrastructure. As an experienced Engineer you will have comprehensive capabilities for understanding hardware/software configurations that comprise system, security, and library management, backup/recovery, operating computer systems in different operating environments, sizing, performance tuning, hardware/software troubleshooting and resource allocation. Apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/

Cool Products and Services

  • See how LinkedIn uses Couchbase to help power its “Follow” service for 300M+ global users, 24x7. 

  • VividCortex is a hosted (SaaS) database performance management platform that provides unparalleled insight and query-level analysis for both MySQL and PostgreSQL servers at micro-second detail. It's not just another tool to draw time-series charts from status counters. It's deep analysis of every metric, every process, and every query on your systems, stitched together with statistics and data visualization. Start a free trial today with our famous 15-second installation.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Aerospike demonstrates RAM-like performance with Google Compute Engine Local SSDs. After scaling to 1 M Writes/Second with 6x fewer servers than Cassandra on Google Compute Engine, we certified Google’s new Local SSDs using the Aerospike Certification Tool for SSDs (ACT) and found RAM-like performance and 15x storage cost savings. Read more.

  • FoundationDB 3.0. 3.0 makes the power of a multi-model, ACID transactional database available to a set of new connected device apps that are generating data at previously unheard of speed. It is the fastest, most scalable, transactional database in the cloud - A 32 machine cluster running on Amazon EC2 sustained more than 14M random operations per second.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free! (See how Scalyr is different if you're looking for a Splunk alternative.)

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Feb022015

Marco Arment Uses Go Instead of PHP and Saves Money by Cutting the Number of Servers in Half

On the excellent Accidental Tech Podcast there's a running conversation about Marco Arment's (Tumblr, Instapaper) switch to Go, from a much loved PHP, to implement feed crawling for Overcast, his popular podcasting app for the iPhone.

In Episode 101 (at about 1:10) Marco said he halved the number of servers used for crawling feeds by switching to Go. The total savings was a few hundred dollars a month in server costs.

Why? Feed crawling requires lots of parallel networking requests and PHP is bad at that sort of thing, while Go is good at it. 

Amazingly, Marco wrote an article on how much Overcast earned in 2014. It earned $164,000 after Apple's 30%, but before other expenses. At this revenue level the savings, while not huge in absolute terms given the traffic of some other products Marco has worked on, was a good return on programming effort. 

How much effort? It took about two months to rewrite and debug the feed crawlers. In addition, lots of supporting infrastructure that tied into the crawling system had to be created, like the logging infrastructure, the infrastructure that says when a feed was last crawled, monitoring delays, knowing if there's queue congestion, and forcing a feed to be crawled immediately.

So while the development costs were high up front, as Overcast grows the savings will also grow over time as efficient code on fast servers can absorb more load without spinning up more servers.

Lots of good lessons here, especially for the lone developer:

Click to read more ...

Friday
Jan302015

Stuff The Internet Says On Scalability For January 30th, 2015

Hey, it's HighScalability time:


It's a strange world...exotic, gigantic molecules Fit Inside Each Other like Russian nesting dolls
  • 1.39 billion: Facebook Monthly Active Users; $18 billion profit: Apple in 3 months; 200 million: Kik users; 11.2 billion: age of the oldest known solar system; 3 billion: videos viewed per day on Facebook
  • Quotable Quotes:
    • @kevinroose: This dude wins SF bingo. RT @caro: An Uber driver is Airbnb'ing the trunk of his Tesla for $85/night.
    • @BenedictEvans: Only 16% of Facebook DAUs aren't using it on mobile
    • @rezendi: Yo's Law: "in the 21st century tech industry, satire and reality are not merely indistinguishable but actually interchangeable."
    • Brent Ozar: I recommend that people back up data, not servers.
    • @AnnaPawlicka: "Shared State is the Root of All Evil"
    • Peter Lawrey: micro-day - about 1/12 of a second. micro-century - 51.3 minutes. femto-parsec - about 30 metres.
    • TapirLiu: OH: docker is like a condom to protect your computer from Node.
    • @DigitCurator: "The Next Decade In Storage": Resistive RAM promises better scaling, efficiency, and 1000x endurance of flash memory 
    • @BenedictEvans: At the end of 2014 Apple had ~650-675m live iOS devices. With zero unit sales growth, 700-720m by end 2015. Consumer PCs in use - 7-800m
    • @MailChimp: We sent 14.1 billion emails in December, including 741 million on Cyber Monday.
    • @mjpt777:  That's in the past. We can now do 20 million per second :-) per stream.
    • @bradwilsonConclusions: 1. Ethernet over power does not perform as well as WiFi (??) 2. Ethernet over power hates being shared among multiple PCs
    • @mjpt777: Specialized Evolution of the General-Purpose CPU  - note that performance per watt is approx doubling per generation. 
    • @nighitingale: "The Earth is 4.6 billion years old. Scaling to 46 years, humans have been here 4 hours, the industrial..."
    • Joseph Campbell: The hero’s journey always begins with the call. One way or another, a guide must come to say, “Look, you’re in Sleepy Land. Wake. Come on a trip."
    • Frank Herbert: the most persistent principles of the universe were accident and error.

  • Will Facebook ever figure out this mobile thing? Not long ago that was the big question. We have an answer. In the fourth quarter, the percentage of its advertising revenue from mobile devices increased to 69%, up from 66% in the third quarter and 53% a year earlier. Mobile daily active users were 745 million on average for December 2014, an increase of 34 percent year-over-year.

  • The power of smart: Facebook’s Powerful Ad Tools Grew Its Revenue 25X Faster Than User Count. Facebook might be running out of people, but they aren't running out of ways of monetizing those people. Math grows faster than users.

  • The Cathedral of Computation by Ian Bogost. Agree in part. There does seem to be an uncritical acceptance of algorithms, as if because they enliven machines they are some how pure and objective, when the opposite is the case. Algorithms are made for human purposes by teams of humans and show the biases and hubris of their makers. And like all creatures, algorithms should be subject to skepticism, law, and review.

  • We have many long running debates in tech. Server side vs client side rendering is just one of them. A thoughtful analysis: Tradeoffs in server side and client side rendering by Malte Ubl.  Bret Slatkin boldly claims: Experimentally verified: "Why client-side templating is wrong". He concludes: I hope never to render anything server-side ever again. I feel more comfortable in making that choice than ever thanks to all this data. I see rare occasions when server-side rendering could make sense for performance, but I don't expect to encounter many of those situations in the future.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Jan282015

Instagram Strategy to Radically Reduce Traffic: Kill all the spambots!

RIP to my fallen robot followers on Instagram, if there's a heaven for robot instagram users, you guys are in there

— alldaychubbyboy (@Allday)

How do you scale to handle increased user traffic? Have less traffic. No, this is not a koan. The best way to deal with traffic is not to have it. 

In a two day span Instagram disappeared 18.9 million users or more than 29 percent of their "followers." Justin Bieber lost 3.5 million followers (15 percent), Kim Kardashian lost 1.3 million followers (5.5 percent), Rihanna lost 1.2 million followers.

Instagram explains this dramatic reckoning was achieved by "removing deactivated spam accounts and accounts that violated its community guidelines." 

In an age when high user counts and tantalizing engagement metrics are more valuable than bitcoins, this can't have been an easy decision, but it was made after being bought by Facebook.

Why? Gabe Madway, an Instagram spokesman, tells us why: We totally get that it’s uncomfortable for people. The overall goal is we want it to be perceived that the people following you are real.

Uncomfortable is an understatement. A BuzzFeed article nicely captured some of the anger, here's just one example (could be NSFW):

Click to read more ...

Monday
Jan262015

Paper: Immutability Changes Everything by Pat Helland

I was excited to see that Pat Helland has published another thought provoking paper: Immutability Changes Everything. If video is more your style, Pat gave a wonderful talk on the same subject at RICON2012 (videoslides).

It's fun to see how Pat's thinking is evolving over time as he's worked at Tandem Computers (TransactionMonitoring Facility), Amazon, Microsoft (Microsoft Transaction Server and SQL Service Broker), and now Salesforce.

You might have enjoyed some of Pat's other visionary papers: Life beyond Distributed Transactions: an Apostate’s OpinionThe end of an architectural era: (it's time for a complete rewrite), and Idempotence Is Not a Medical Condition.

This new paper is a high level overview of why immutability, the idea that destructive updates are not allowed, is a huge architectural win and because of cheaper disk, RAM, and compute, it's now financially feasible to keep all the things. The key insight is that without data updates, coordination in a distributed system becomes a much simpler problem to solve.

Immutability is an architectural concept that's been gaining steam on several fronts. Facebook is using a declarative immutable programming model in both the model and the view. We are seeing the idea of immutable infrastructure rise in DevOps. Aeron is a new messaging system that uses a persistent log to good advantage. The Lambda Architecture makes use of immutability. Datomic is a database data that treats data as a time-ordered series of immutable objects.

If that's of interest, then you'll like the paper.

Overview:

Click to read more ...

Friday
Jan232015

Stuff The Internet Says On Scalability For January 23rd, 2015

Hey, it's HighScalability time:


Elon Musk: The universe is really, really big  [Gigapixels of Andromeda [4K]]
  • 90: is the new 50 for woman designer; $656.8 million: 3 months of Uber payouts; $10 billion: all it takes to build the Internet in space; 1 billion: registered WeChat users
  • Quotable Quotes:
    • @antirez: Tech stacks, more replaceable than ever: hardware is better, startups get $$ (few nodes + or - who cares), alternatives countless.
    • Olivio Sarikas: If every Star in this Image was a 2 millimeter Sandcorn you would end up with 1110 kg of Sand!!!!!!!!!
    • Chad Cipoletti: In even simpler terms, we see brands as people.
    • @timoreilly: Love it: “We need a stack, not a pile” says @michalmigurski.
    • @neha: I would be very happy to never again see a distributed systems paper eval on a workload that would fit on one machine.
    • @etherealmind: OH: "oh yeah, the extra 4 PB of storage is being installed today. Its about 4 racks of gear".
    • @lintool: Andrew Moore: Google's ecommerce platform ingests 100K-200K events per second continuously. 

  • Programming as myth building. Myths to Live By: The true symbol does not merely point to something else. It contains in itself a structure which awakens our consciousness to a new awareness of the inner meaning of life and of reality itself. A true symbol takes us to the center of the circle, not to another point on the circumference.

  • Not shocking at all: "We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code...A majority (77%) of the failures require more than one input event to manifest, but most of the failures(90%) require no more than 3." Really, who has the time? More on human nature in Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems.

  • Let simplicity fail before climbing the complexity ladder. Scalability! But at what COST?: "Big data systems may scale well, but this can often be just because they introduce a lot of overhead. Rather than making your computation go faster, the systems introduce substantial overheads which can require large compute clusters just to bring under control. In many cases, you’d be better off running the same computation on your laptop." But notice the kicker: "it took some work for parallel union-find." Replacing smart work with brute force is often the greater win. What are a few machine cycles between friends?

  • Programming is the ultimate team sport, so Why are Some Teams Smarter Than Others? The smartest teams were distinguished by three characteristics. First, their members contributed more equally to the team’s discussions. Second, their members can better read complex emotional states. Third, teams with more women outperformed teams with more men.

  • WhatsApp doesn't understand the web. Interesting design and discussions. Using proprietary Chrome APIs is a tough call, but this is more perplexing: "Your phone needs to stay connected to the internet for our web client to work." Is this for consistency reasons? To make sure the phone and the web stay in sync? Is it for monetization reasons? It does create a closed proxy that effectively prevents monetization leaks. It's tough to judge a solution without understanding the requirements, but there must be something compelling to impose so many limitations.

  • Roman Leventov analysis of Redis data structures. In which Salvatore 'antirez' Sanfilippo addresses point by point criticisms of Redis' implementation. People love Redis, part of that love has to come from what a good guy antirez is. Here he doesn't go all black diamond alpha nerd in the face of a challenge. He admits where things can be improved. He explains design decisions in detail. He advances the discussion with grace, humility, and smarts. A worthy model to emulate.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Page 1 ... 4 5 6 7 8 ... 183 Next 10 Entries »