advertise
Tuesday
Oct132009

Why are Facebook, Digg, and Twitter so hard to scale?

Real-time social graphs (connectivity between people, places, and things). That's why scaling Facebook is hard says Jeff Rothschild, Vice President of Technology at Facebook. Social networking sites like Facebook, Digg, and Twitter are simply harder than traditional websites to scale. Why is that? Why would social networking sites be any more difficult to scale than traditional web sites? Let's find out.

Traditional websites are easier to scale than social networking sites for two reasons:

Click to read more ...

Monday
Oct122009

High Performance at Massive Scale – Lessons learned at Facebook

Jeff Rothschild, Vice President of Technology at Facebook gave a great presentation at UC San Diego on our favorite subject: "High Performance at Massive Scale –  Lessons learned at Facebook". The abstract for the talk is:

Facebook has grown into one of the largest sites on the Internet today serving over 200 billion pages per month. The nature of social data makes engineering a site for this level of scale a particularly challenging proposition. In this presentation, I will discuss the aspects of social data that present challenges for scalability and will describe the the core architectural components and design principles that Facebook has used to address these challenges. In addition, I will discuss emerging technologies that offer new opportunities for building cost-effective high performance web architectures.

There's a lot of interesting about this talk that we'll get into  later, but I thought you might want a head start on learning how Facebook handles 30K+ machines, 300 million active users, 20 billion photos, and 25TB per day of logging data.

Click to read more ...

Friday
Oct092009

Have you collectl'd yet? If not, maybe collectl-utils will make it easier to do so

I'm not sure how many people who follow this have even tried collectl but I wanted to let you all know that I just released a set of utilities called strangely enough collectl-utils, which you can get at http://collectl-utils.sourceforge.net. One web-based utility called colplot gives you the ability to very easily plot data from multiple systems in a way that makes correlating them over time very easy.

Click to read more ...

Thursday
Oct082009

Riak - web-shaped data storage system

Update: Short presentation NYC by Bryan Fink  demonstrating the riak web-shaped data storage engine

Riak is another new and interesting key-value store entrant. Some of the features it offers are:

  • Document-oriented
  • Scalable, decentralized key-value store
  • Standard getput, and delete operations. 
  • Distributed, fault-tolerant storage solution.
  • Configurable levels of consistency, availability, and partition tolerance
  • Support for Erlang, Ruby, PHP, Javascript, Java, Python, HTTP
  •  open source and NoSQL
  • Pluggable backends
  • Eventing system
  • Monitoring
  • Inter-cluster replication
  • Links between records that can be traversed.
  • Map/Reduce. Functions are executed on the data node. One interesting difference is that a list keys are required to specify which values are operated on as apposed to running calculations on all values. 

Related Articles

  • Hacker News Thread. More juicy details on how Riak compares to Cassandra, mongodb, couchdb, etc. 

 

Wednesday
Oct072009

How to Avoid the Top 5 Scale-Out Pitfalls

Scale-Out is incrementally adding servers as needed to scale rather than buying larger servers. Here's the MySQL idea of what a scale-out architecture looks like:


This MySQL article lists 5 problems to avoid when scaling out:
  1. Don't Think Synchronously. Introduce asynchronous communication, parallelization, and strategies to deal with approximate or slightly outdated data.
  2. Don't Think Vertically.  Scaling by bigger machines won't work. Plan on horizontal scaling and asynchronous architectures form the start which make it easy to add capacity on demand.
  3. Don't Mix Transactions with Business Intelligence. Transactions and analytics are inherently different. Separate out different types of data onto different databases.
  4. Avoid Mixing Hot and Cold Data. Static and fast changing data are inherently different. Separate out different types of data onto different databases.
  5. Don't Forget the Power of Memory.  Make data accessible in RAM by smartly partitioning data across servers.

More information at Scale-Out & Replication Best Practices for High-Growth Businesses.

Tuesday
Oct062009

Building a Unique Data Warehouse

There are many reasons to roll your own data storage solution on top of existing technologies. We've seen stories on HighScalability about custom databases for very large sets of individual data(like Twitter) and large amounts of binary data (like Facebook pictures). However, I recently ran into a unique type of problem. I was tasked with recording and storing bandwidth information for more than 20,000 servers and their associated networking equipment. This data needed to be accessed in real-time, with less than a 5 minute delay between the data being recorded and the datashowing up on customer bandwidth graphs on our customer portal.

After numerous false starts with off the shelf components and existing database clustering technology, we decided we must roll our own system. The real key to our problem (literally) was the ratio of the size of the key to the size of the actual data. Because the tracked metric was so small (a 64-bit counter) compared to the unique identifier (32-bit network component ID, 32-bit timestamp, 16-bit data type identifier) existing database technologies would choke on the key sizes.

Eventually it was decided that the best solution was to write our own wrapper for standard MySQL databases. No fancy features, no clustering, no merge tables or partitioning, no extra indexes, just hundreds of thousands of flat tables on as many physical machines as was necessary. I chronicled the whole decision making process in the full article, located here, on our developers' blog.

Tuesday
Oct062009

10 Ways to Take your Site from One to One Million Users by Kevin Rose  

At the Future of Web Apps conference Kevin Rose (Digg, Pownce, Wefollow) gave a cool presentation on the top 10 down and dirty ways you can grow your web app. He took the questions he's most often asked and turned it into a very informative talk.

This isn't the typical kind of scalability we cover on this site. There aren't any infrastructure and operations tips. But the reason we care about scalability is to support users and Kevin has a lot of good techniques to help your user base bloom.

Here's a summary of the 10 ways to grow your consumer web application:

1. Ego. Ask does this feature increase the users self-worth or stroke the ego? What emotional and visible awards will a user receive for contributing to your site? Are they gaining reputation, badges, show case what they've done in the community? Sites that have done it well:

Twitter.com followers. Followers turns every single celebrity as spokesperson for your service. Celebrities continually pimp your service in the hopes of getting more followers. It's an amazing self-reinforcing traffic generator. Why do followers work? Twitter communication is one way. It's simple. Followers don't have to be approved and there aren't complicated permission schemes about who can see what. It means something for people to increase their follower account. It becomes a contest to see who can have more. So even spam followers are valuable to users as it helps them win the game.

Digg.com leader boards. Leader Boards show the score for a user activity. In digg it was based on the number of articles submitted. Encourage people to have a competition and do work inside the digg ecosystem. Everyone wants to see their name in lights. 

Digg.com highlight users. Users who submitted stories where rewarded by having their name in a larger font and a friending icon put beside their story submission. Users liked this.

2. Simplicity. Simplicity is the key. A lot of people overbuild features. Don't over build features. Release something and see what users are going to do. Pick 2-3 on your site and do them extremely well. Focus on those 2-3 things. Always ask if there's anything you take out from a feature. Make it lighter and cleaner and easy to understand and use.

3. Build and Release. Stop thinking you understand your users. You think users will love this or that and you'll probably be wrong. So don't spend 6 months building features users may not love or will only use 20% of. Learn from what users actually do on your site. Avoid analysis paralysis, especially as you get larger. Decide, build, release, get feedback, iterate.

4. Hack the Press. There are techniques you can use that will get you more publicity.

Invite only system. Get press by creating an invite only system. Have a limited number of invites and seed them with bloggers.  Get the buzz going. Give each user a limited number of invites (4 or 5). It gets bloggers talking about your service. The main stream press calls and you say you are not ready. This amps the hype cycle. Make new features login-only, accessible only if you log in but make them visible and marked beta on the site. This increases the number of registered users.

Talk to junior bloggers. On Tech Crunch, for example, find the most junior blogger and pitch them. It's more likely you'll get covered.

Attend parties for events you can't afford.  You can go to the after parties for events you can't afford. Figure out who you want to talk to. Follow their twitter accounts and see where they are going. 

Have a demo in-hand. People won't understand your great vision without a demo. Bring an iPhone or laptop to show case the demo. Keep the demo short, 30-60 seconds. Say: Hey, I just need 30 seconds of your time, it's really cool, and here's why I think you'll like it. Slant it towards what they do or why they cover.

5. Connect with your community.

Start a podcast. A big driver in the early days of Digg. Influencers will listen and they are the heart of your ecosystem. 

Throw a launch party and yearly and quarterly events. Personally invite influencers and their friends. Just have a party at a bar. Throw them around conferences as people are already there. 

Engage and interact with your community.

Don't visually punish users. Often users don't understand bad behaviour yet as they think they are just playing they game your system sets up. Walk through the positive behaviours you want to reinforce on the site.

6. Advisors. Have a strong group of advisors. Think about which technical, marketing and other problems you'll have and seek out people to help you. Give them stock compensation. A strong advisory team helps with VCs.

7. Leverage your user base to spread the world. 

FarmVille. tells users when other players have helped them and asks the player to repay the favor. This gets players back into the system by using a social obligation hack. They also require having a certain number of friends before you expand your farm. They give away rare prizes.

Wefollow. Tweets hashtags when people follow someone else. This further publicizes the system. They also ask when a new user hits the system if they wanted to be added to the directory, telling the user that X hundred thousand of your closest friends have already added themselves. This is the number one way they get new users.

8. Provide value for third party sites. Wallstreet Journal, for example, puts FriendFeed, Twitter, etc links on every page because they think it adds value to their site. Is there some way you can provide value like that?

9. Analyze your traffic. Install Google analytics, See where people are entering from. Where they are going. Where they are exiting from and how you can improve those pages.

10. The entire picture. Step back and look at the entire picture. Look at users who are creating quality content. Quality content drives more traffic to your site. Traffic going out of your site encourages other sites to add buttons to your site which encourages more users and more traffic into your site. It's a circle of life. Look at how your whole eco system is doing.

Related Articles

 

Friday
Oct022009

HighScalability has Moved to Squarespace.com! 

You may have noticed something is a little a different when visiting HighScalability today: We've Moved! HighScalability.com has switched hosting services to Squarespace.com. House warming gifts are completely unnecessary. Thanks for the thought though.

It's been a long long long process. Importing a largish Drupal site to Wordpress and then into Squarespace is a bit like dental work without the happy juice, but the results are worth it. While the site is missing a few features I think it looks nicer, feels faster, and I'm betting it will be more scalable and more reliable. All good things.
I'll explain more about the move later in this post, but there's some admistrivia that needs to be handled to make the move complete:

  • If you have a user account and have posted on HighScalability before then you have a user account, but since I don't know your passwords I had to make new passwords up for you. So please contact me and I'll give you your password so you can login and change it. Then you can post again. Sorry for this hassle, but for posts to be assigned to authors on import user accounts had to exist so I had to create them. Another issue is that login names in Squarespace are less flexible than under Drupal. The only allowable special character is the '-'. So if your login name contains a space or '_' or a '.' I changed those characters to a '-'.
  • If you have a user account and have never posted on HighScalability before you'll have to register in order to recreate your user account. Sorry, but with so many users I couldn't recreate all the user accounts by hand.
  • If you could switch RSS over to http://feeds.feedburner.com/HighScalability that would help a lot. The old RSS will still work.
  • A lot of links were broken during the move due to the imperfection of the export/import process. Some of the formatting looks a little strange now too. It's going to take me a while to fix all these problems. If there's anything you see that needs fixing please shoot me an email.
  • There's no tag cloud anymore, but there's an All Posts page that lists every post by category, by week, and by month.

This isn't pleasant but there was no way I could make the procees transparent. I appreciate your help and understanding.

Why was the move made?

I've played with and considered virtually every CMS available. I went with Squarespace based on weighing a few of my own personal goals and pain points:

Eating my own dog food. I've been a big advocate of cloud based memory grids. Since Squarespace uses a memory grid architecture I felt it would be a good experience to make use of their service (if I could make it work).

End-to-end management. I don't want to have to worry about my site. Ever. I want it to be managed end-to-end by the hosting service. In industry when they say they offer a managed service they usually mean the hardware/network/software stack are managed, you are still responsible for site uptime. The problem is a Drupal + LAMP + VPS stack isn't a hands off affair. Things go wrong and you have to be always on call. That's fine if you have a few people working a site, they can take turns handling the load. But if your are alone or on vacation, it doesn't work. You are always in the back of your mind worrying that something might be going wrong. By leaving the management of the entire stack to the hosting service then this worry largely goes away, assuming the host is good at their job.

Performance, scalability, reliability. I want the system to feel fast, to handle a lot of users, and to be reliable. For my purposes I don't really expect to have more users than I do now so I'm not looking for infinite headroom. But for the traffic I do have there should be no problems.

Price. A managed VPS with any sort of capability is expensive for a site that doesn't generate a lot of revenue yet gets too many users for shared hosting. A price point between shared hosting and a managed VPS would be very attractive. Some of the end-to-end managed services are enterprise plays and are way too expensive for the little guy.

Support. You are always at the mercy of your host, even with a cloud or colo. Good support you can count on makes all the difference when you are trying to get a site up and running and when disaster hits. Some service providers promise to get back to you within 8 hours. This is the Internet, 8 hours might as well be forever. No thanks. 

So far my experience with Squarespace has been very possitive accross all my criteria. 

They manage the site completely so my end-to-end management requirement is satisfied. A site is managed through a truly innovative browser based GUI that makes template customization and other operations quite straightforward. It will also tell you cool things like how many RSS readers you have and which posts are getting the most traffic.

I am impressed with how robust the system feels and how fast it is even doing large operations. I never feel like I'm going to break it or corrupt it and I'm almost never waiting on it to finish operations. Things just work. There's a lot of quality thought and work that's been put into the system and it shows.

Will it scale? Obviously I haven't tested that out yet, but it seems to handle largers sites so I'm fairly confident.

The price is quite reasonable, but I feel it's enough that they can make money without having to cut corners. It's a good value.

Support is excellent. Questions are answered within a short period of time and they are generally helpful. And I've asked some really stupid questions. When I couldn't set the date using a calendar widget they hardly even laughed. What they did do is make a screencast showing me what I needed to do and I was back in business. 

Or course nothing is perfect. Those imperfections will show up in a lack of a few features and some of what needs to happen to make the transition to the new system complete.

It's clear they've put a lot of work in their back-end and front-end. What is missing are the wide numbers of modules you'll find for products like Drupal, Joomla, and Wordpress. Squarespace offers a small set of widgets, which are good, but the widget set is small and isn't as configurable as for other products. Part of the problem is that Squarespace doesn't offer an API for their system so third parties can't make widgets. So simple widgets like avatars, tag clouds, today's popular posts, the most popular posts of all time, recent forum posts, read counts, and logged in users are not available. 

Other problems are in the process of moving an existing site into Squarespace.

Drupal is not one of Squarespace's supported import platforms. Drats! So I had to write scripts to export Drupal to Wordpress in such a way that as much of the meta data as possible was available in Squarespace. This was not easy to do. Squarespace does not have their own defined import format, which would have made life a lot easier.

Some other problems is there's no way to bulk operations to import users and map URLs. Each user has to be created by hand. If you don't have the users already created when posts are imported then the posts won't be assigned to the correct author. 

One of the most important things to do when moving a site is preserve your old URLs. Every service sucks at this. Squarespace does have a way to map URLs, but again there's no way to bulk import the mappings. You have to do them one by one through their GUI. It's an enourmous pain. But it was doable, so that's something at least.

These issues weren't serious enough for me not to go with Squarespace, but a site looking to build a real community may have to look a little closer.

So that's the story. There's a lot of work yet to fix broken links and formatting, but I hope that won't take too long.

Please let me know what you think.

thanks

Todd Hoff

Thursday
Oct012009

Private Data Cloud: 'Do It Yourself' with Eucalyptus 

Private Clouds provide many of the benefits of the Public Cloud, namely elastic scalability, faster time-to-market and reduced OpEX, all within the Enterprises own perimeter that complies to its governance. Leading commercial Private Cloud products include VMware, Univa UD, Unisys. Open source solutions include pro ducts like Globus Nimbus, Enomaly Elastic Computing Platform, RESERVOIR and Eucalyptus.

Read more at: http://bigdatamatters.com/bigdatamatters/2009/09/private-cloud-eucalyptus.html

Thursday
Oct012009

Moving Beyond End-to-End Path Information to Optimize CDN Performance

You go through the expense of installing CDNs all over the globe to make sure users always have a node close by and you notice something curious and furious: clients still experience poor latencies. What's up with that? What do you do to find the problem? If you are Google you build a tool (WhyHigh) to figure out what's up. This paper is about the tool and the unexpected problem of high latencies on CDNs. The main problems they found: inefficient routing to nearby nodes and packet queuing. But more useful is the architecture of WhyHigh and how it goes about identifying bottle necks. And even more useful is the general belief in creating sophisticated tools to understand and improve your service. That's what professionals do. From the abstract:
Replicating content across a geographically distributed set of servers and redirecting clients to the closest server in terms of latency has emerged as a common paradigm for improving client performance. In this paper, we analyze latencies measured from servers in Google’s content distribution network (CDN) to clients all across the Internet to study the effectiveness of latency-based server selection. Our main result is that redirecting every client to the server with least latency does not suffice to optimize client latencies. First, even though most clients are served by a geographically nearby CDN node, a sizeable fraction of clients experience latencies several tens of milliseconds higher than other clients in the same region. Second, we find that queueing delays often override the benefits of a client interacting with a nearby server.
To help the administrators of Google’s CDN cope with these problems, we have built a system called WhyHigh. First, WhyHigh measures client latencies across all nodes in the CDN and correlates measurements to identify the prefixes affected by inflated latencies. Second, since clients in several thousand prefixes have poor latencies, WhyHigh prioritizes problems based on the impact that solving them would have, e.g., by identifying either an AS path common to several inflated prefixes or a CDN node where path inflation is widespread. Finally, WhyHigh diagnoses the causes for inflated latencies using active measurements such as traceroutes and pings, in combination with datasets such as BGP paths and flow records. Typical causes discovered include lack of peering, routing misconfigurations, and side-effects of traffic engineering. We have used WhyHigh to diagnose several instances of inflated latencies, and our efforts over the course of a year have significantly helped improve the performance offered to clients by Google’s CDN.

Related Articles

  • Product: Akamai