Justin.tv is looking to hire a Scaling Engineer to help scale their video cluster, IRC server, web app, monitoring and search services. I've never seen this job title before. A quick search that showed only a few previous instances of it being used. Has anyone else seen Scaling Engineer as a job title before? It's a great idea. Scaling is certainly a worthy specialty of it's own. Why there's a difficult lingo, obscure tools, endlessly subtle concepts, a massive body of knowledge to master, and many competing religious factions. All a good start. Next I see a chain of Scalability Universities. Maybe use all those Starbucks that are closing down. Contact me for franchise opportunities :-)
Now you can buy more cores on EC2 without adding more machines:
Hi, I want to implement a search engine with lucene. To be scalable, I would like to execute search jobs asynchronously (with a job queuing system). But i don't know if it is a good design... Why ? Search results can be large ! (eg: 100+ pages with 25 documents per page) With asynchronous sytem, I need to store results for each search job. I can set a short expiration time (~5 min) for each search result, but it's still large. What do you think about it ? Which design would you use for that ? Thanks Mat
The following technical Webinar could be of interest to the community. WHO:
- Farhan "Frank" Mashraqi, Director of Business Operations and Technical Strategy, Fotolog Inc
- Monty Taylor, Senior Consultant, Sun Microsystems
- Jimmy Guerrero, Sr Product Marketing Manager, Sun Microsystems - Database Group
- Designing and Implementing Scalable Applications with Memcached and MySQL web presentation.
- Thursday, May 29, 2008, 10:00 am PST, 1:00 pm EST, 18:00 GMT
- The presentation will be approximately 45 minutes long followed by Q&A.
Hi, We're looking for a highly scalable way of scanning documents being uploaded and downloaded from our web application. I believe services like gmail and hotmail are using bespoke solutions from companies like Trend, but are there some quality "off the shelf" products out there that can easily be scaled out and have a "loose" API (HTTP based) for application integration? Once again, thanks for any input.
Customer: - Name - Country Product: - Code - Name - Description Purchases: - Reference to Product Entity - Reference to Customer Entity - Date of orderAnyone from a relational background would look at this schema and give it a big thumbs up. With a little effort we can imagine the original physical purchase order that has now been normalized into three different tables. To recreate the original purchase order a join on purchases, produce and customer is needed. Read speed is not optimized, safety is optimized. Here’s what the same schema looks like optimized for reading:
Purchase: - Customer Name - Customer Country - Product Code - Product Name - Purchase Order Number - Date Of OrderThe three original tables have been folded into one entity. Now a purchase order can be read in one get operation. No join necessary. Notice how the entity looks more like an original purchase order. It is also what would probably be cached and is what our model would probably look like. But what if you want to update a product name or a customer name? Those attributes are duplicated in all entities. Here’s where the protection offered by the relational model comes in. Only one entity needs updating in a normalized model. In BigTable you have to remember everywhere a customer name and product name and change every instance to new values. It’s not a simple, safe, or reliable approach. But it does optimize for read speed and scalability. For an application with a high proportion of updates to reads this approach wouldn’t make sense. But on the web reads usually dominate. How often do you really change a customer name or a product name? Seldom. How often do you read them? All the time. Designing to scale for reads and taking the pain on writes takes some getting used to. It’s a massive change to standard relational tactics. But this is what it takes to scale web applications, even if it feels a little strange at first.
Update 2: EBay's Randy Shoup spills the secrets of how to service hundreds of millions of users and over two billion page views a day in Scalability Best Practices: Lessons from eBay on InfoQ. The practices: Partition by Function, Split Horizontally, Avoid Distributed Transactions, Decouple Functions Asynchronously, Move Processing To Asynchronous Flows, Virtualize At All Levels, Cache Appropriately. Update: eBay Serves 5 Billion API Calls Each Month. Aren't we seeing more and more traffic driven by mashups composed on top of open APIs? APIs are no longer a bolt on, they are your application. Architecturally that argues for implementing your own application around the same APIs developers and users employ. Who hasn't wondered how eBay does their business? As one of the largest most loaded websites in the world, it can't be easy. And the subtitle of the presentation hints at how creating such a monster system requires true engineering: Striking a balance between site stability, feature velocity, performance, and cost. You may not be able to emulate how eBay scales their system, but the issues and possible solutions are worth learning from. Site: http://ebay.com
What's Inside?This information was adapted from Johannes Ernst's Blog
This website has been a great resource for helping me to understand the successful (and failed) scalable network designs from organizations that have actually done it, but I haven't seen any explicite explanations about secure remote administration of these systems. I understand that the *nix people love to SSH, and the windows gang has their RDP, but how does one go about creating a network architecture that both allows one to manage their systems and does its best to avoid hacker interest? As I imagine, no big website will have the SSH/RDP/FTP ports open on the web server, so how is it that they go about remotely administering their geographically diverse groups of servers securely?
Om proposes one solution to the Twitter Problem is to limit followers to three square meals a day. The reasonable idea being that lower limits should mean fewer scaling problems. And as a kicker raising those limits is a good way to raise much needed revenue. Scoble thinks users should consume without limit and will drive to another buffet if all-you-can-eat privileges are revoked. The reasonable idea being that if an internet service can't solve internet scale problems then there's not much use for it. Dave says comp power users a top floor suite and shower them with free passes to the buffet. Let the good times roll! The reasonable idea being that power users help create popular restaurants, er, services in the first place and limiting them starves users and starved users won't come back. So, should web services like Twitter be a buffet, a fixed eight course fine dining experience, a small plate restaurant, a family style joint, or a vending machine? Or something else entirely? In a distant barely remembered past I actually worked at an all-you-can-eat buffet. The food was very good and most customers didn't over over indulge. If they did the place wouldn't stay in business long. But some customers did. They were called stackers. Stackers were so named because a large stack of plates would pile up on their table throughout the meal. Stackers followed a power law distribution. Few customers at any one time were stackers, but their effect could be devastating. How devastating depended on their favorite foods... A stacker who loved potato salad was manageable. We had plenty of potato salad and it was cheap and quick to make. No problem. Stacking itself was not frowned upon and never discouraged. It's an all-you-can-eat buffet after all! But if a stacker's favorite food was roast beef, that was trouble. Not only is roast beef expensive, it comes in a limited supply because it has to be prepared ahead of time. Once you ran out there was no more roast beef for the rest of the night. Good roast beef takes hours to prepare, it must be planned for. Management's job was to carefully balance projected demand against waste. The goal was to prepare enough meat to meet demand, yet not have a lot of left-overs. Stackers blow apart the finely balanced calculation of how much roast beef to make and the carving station is left trying to push the ham while apologizing for an embarrassing lack of roast beef. An ugly ugly scene. As a carver you are armed with a long scary looking knife and you are shielded by Medieval chain-mail looking glove, but hungry customers are mean and fast. You never see it coming. Unfortunately the distribution of stackers on any given night is unpredictable. You can't always cook a maximum amount of meat or you'll go broke. And if you make too little everyone is unhappy. It needs to be just right. As a person with serious stacker tendencies I try to remember the cost of things and keep a reasonable balance. The only way to make Goldilocks happy and have just the right balance is to place limits. Eventually the restaurant had to limit the number of trips to the roast beef station to three a meal. Enough that you get value for your dollar, but not so much that the restaurant goes under. Everyone happy? Of course not. The world doesn't work like that. It's all-you-can-eat some would say so I should be able to eat all I can eat ! But there are always limits. Would it be fair to back a truck up to the restaurant and start loading up because that's part of your meal? No. Is it fair to stuff your backpack with food on the way out? No. So there are always limits. The question is what are fair limits? It has been said FriendFeed has no problems handling 10,000 friends so neither should Twitter. Now, let's imagine if I spun up 1000 EC2 servers whose only task was to add more friends to feed. Would FriendFeed limit me then? Of course. It's basic web site self-defense, a right guaranteed under the constitution and long recognized by the courts in certain situations. But still, what are fair limits? How much roast beef should you be able to eat? Limit setting is a strategy we've talked about many times as a way of protecting sites from complete devastation. My favorite example is Mailinator whose prime directive is surviving attacks and they've deployed many clever practices in their own defense. And most every large web site on earth is busy watching your every move so they can bounce you at the first sign of DDOS Armageddon. Limits aren't inherently bad. But limits don't make you scale, they simply stop you from unscaling. An adequate scalable infrastructure must still be put in place. In the end I agree with Scoble in that the power of the internet is having interesting conversations with interesting people about interesting topics. For interesting conversations to happen you must be able to freely create relationships. If you or they have to pay for relationships they simply won't form. Would Google's Page Rank algorithm work so well if it could only analyze paid relationships? A web formed under a paid relationship model would look totally different and be decidedly less valuable. Similarly, a social network that can't grow naturally through preferential attachment would have much less value. Scaling relationships is a core social network competency. Relationships should be subject to DDOS type limits, but not limits artificially out of proportion with a user's internet audience. I doubt Twitter would disagree, but they are going through a tough time right now. I also agree with Om. The Freemium model is a great idea and linking that to site protecting prophylactics is even better. But limiting a core competency may not be the right target. Fotolog is an example of a service that puts Freemium ideas to good use. They charge extra for adding more photos a day, more comments a day, custom profile abilities, and social status add ons. What is the equivalent in Twitter? I don't know, but I would try to treat relationships more like potato salad than roast beef. And I also agree with Dave. It's hard to get noticed on the web. Those who help you storm the attention barrier shouldn't be punished. They should be rewarded with a tasty appropriately sized meal.
From their website: Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion. While providing functionality similar to that of a more traditional batch queueing system, Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (such as a key press detected), in many circumstances Condor is able to transparently produce a checkpoint and migrate a job to a different machine which would otherwise be idle. Condor does not require a shared file system across machines - if no shared file system is available, Condor can transfer the job's data files on behalf of the user, or Condor may be able to transparently redirect all the job's I/O requests back to the submit machine. As a result, Condor can be used to seamlessly combine all of an organization's computational power into one resource.