advertise
Sunday
Oct212007

Paper: Standardizing Storage Clusters (with pNFS)

pNFS (parallel NFS) is the next generation of NFS and its main claim to fame is that it's clustered, which "enables clients to directly access file data spread over multiple storage servers in parallel. As a result, each client can leverage the full aggregate bandwidth of a clustered storage service at the granularity of an individual file." About pNFS StorageMojo says: pNFS is going to commoditize parallel data access. In 5 years we won’t know how we got along without it. Something to watch.

Click to read more ...

Saturday
Oct202007

Should you build your next website using 3tera's grid OS?

Update 2: 3tera has added Dynamic Appliances, which are "packaged data center operations like backup, migration or SLAs that users can add to their applications to provide functionality." Update: in an effort to help cross the chasm of how start building a website using their grid OS, 3tera is offering their Assured Success Plan. The idea is to provide training, consulting, and support so you can get started with some confidence you'll end up succeeding. If you are starting or extending a website you have a problem: what technologies should you use? Now there are more answers to that question than ever. One new and refreshingly innovative answer is 3tera's grid OS. In this podcast interview with Bert Armijo from 3tera, we'll learn how 3tera wants to change how you build websites. How? By transforming the physical into the virtual and then allowing the virtual to be manipulated as if it were real. Could I possibly be more abstract? Not really. But when I think of what they are doing that's the mental model I see whirling around in my mind. Don't worry, I promise we'll drill down to how it can help you in the real world. Let's see how. I think of 3tera's product as like staying at a nice hotel. At home you are in charge. If something needs doing you must do it. If something breaks you must fix it. But at a nice hotel everything just happens for you. Your room is cleaned, beds are made, outrageously expensive candy bars are replaced in the mini-bar, food arrives when you order it and plates disappear when you are done, and the courtesy mint is placed just so on your pillow. You are free to simply enjoy your stay. All the other details of living just happen. That's the same sort of experience 3tera is trying to provide for your website. You can concentrate on your application and 3tera, through their GUI on the front-end and their AppLogic grid operating system on the back-end, worries about all the housekeeping. I think Bert summed up their goal wonderfully when said their aim is to:

Get peoples hands off physical boxes and to give them a way to define complex infrastructures in a reusable way that they can then instantiate, trade, sell, or replicate, backup up and manage as individual units. This is what AppLogic that does incredibly way.
What they are doing is taking hard physical resources like CPU and storage and decoupling them from their physical sources so you can just order and use them on demand without worrying how its done under the covers. This is trend that has been happening for a while, but their grid OS takes that process to the next level. Your physical co-lo cage is now a private virtual data center. Physical boxes, once lovingly spec'ed, bought, and installed are now allocated on demand from a phalanx of preconfigured and separately maintained servers. Physical storage, once lovingly pieced together from disks, controllers, and networks is now allocated from a vast unending sea of virtual storage. Physical load balancers are now programs you can create. What this means for you is you can take a website architecture you've draw up on your white board and simply and quickly create it in a data center. Its all configurable from a GUI. You can bring on 10 new web servers with a simple drag and drop operation. It's basically your white board diagram come to life, only you get to skip all the nasty implementation bits. In the virtual world the nasty non application related implementation bits are someone else's problem. 3tera's value proposition pretty easy to understand:
  • Simplify the data center. You no longer need to locate, outfit, staff, maintain, and support a co-lo space.
  • Simplify operations. A few people can manage a lot machines.
  • Simplify disaster recovery. Failover is complicated and often doesn't work as planned. With AppLogic your redundant data center is always the same because the virtual data center is copied a unit. You can pick it up and move it anywhere you want.
  • Simplify the cost model for growth. If you grow how are you going to fund your hardware? Growing on a grid is more agile, incremental, and requires less upfront investment.
  • Simplify your architecture. The grid OS provides a powerful implementation model of how you should structure, grow, and maintain your system. You don't need to code it from scratch or think it up yourself. In short: customers don't care about your servers. Hardware and the data center do not add value. You core competency is in your application and running your business, not playing with servers. Well, that's it for the overview. Please listen to this podcast for all the nitty-gritty details. Download audio file (1:16 minutes, mp3).

    Podcast Notes

    I know what you are probably saying. You are saying: "But Todd, the podcast is over an hour long, couldn't you have please made it longer? I have nothing else to do today and I need to waste more time!" What can I say, Bert was very knowledgeable and helpful, and this is a new model for building scalable websites so I was trying to figure out how I could physically make a website using their product. That takes a lot of questions. I am happy with the result though. I think I have a good picture of how their system works and I think its well worth investigating if you are in the market for creating or expanding a website. Here are some notes taken from the podcast.
  • They started 3 years ago. At that time nobody could understand what they were trying to build. They have just now been able to build the higher level features, like Smart Appliances, that they wanted to build originally. They've been concentrating on making all the plumbing work.
  • The AppLogic grid operating system allows you to take hard infrastructure servers, load balancers, firewalls, VPNs, all these boxes you need to make a website and it allows you deploy these in a virtual data center
  • A virtual data center (VDC) is like a cage you would buy from a co-location service except you operate and manage it through a browser. You can be anywhere in the world and you can use hosting services anywhere in the world.
  • An entry level package ranges from $500 to a few thousand a month. The starting point is 4 - 32 CPUs, some amount of storage and some amount bandwidth. You add resources as you need to. Overage charges are passed through to you from the data center provider. They don't mark it up.
  • They don't own any servers. They contract with hosting providers data centers, like Softlayer and Layeredtech, for a uniform set of resources.
  • They offer templates for a scalable virtualized LAMP infrastructure as a starting point for building your own applications.
  • Their GUI shows you the architecture. You don't have to think of physical boxes.
  • There's a controller for the VDC through which you can provision your system.
  • You can still login to any physical or logical service. You have root access. You can install anything and manage the system, but you don't have to worry about where it physically resides.
  • To create an application: - You use the controller to provision a LAMP cluster. - Then you log into Apache server and configure it how you wish. - Then restart and it begins to serve. - Say you want 10 front-end web servers. - The load balancer is a virtual load balancer you program. - You use virtual NAS. - Upload code to the NAS. - Then have all apache servers run off the NAS. So you don't have to log into all and upload code.
  • Shared storage is part of the virtual data center by definition. You can create as many volumes as you wish. All are mirrored for high availability. If a virtual server goes down AppLogic will simply restart on another available resource in the data center.
  • Partners build the grid backbone to which nodes and other resources are attached. AppLogic runs that grid backbone. When you sign up you provision the virtual data center the nodes on the backbone are assigned to your VDC. A controller allows you to provision your VDC. Anything you can do in co-lo cage you can do, but there's nothing physical. AppLogic carries out your commands on the grid.
  • They provide standards for the hosting service. A variety of machine classifications available. Have customers with 50TB of storage. The largest number of CPUs in a single VDC is over 450.
  • To see if the VDC meets your requirements you run a test on the VDC. Once you have resources in your VDC they are not shared with anyone else so you can be confident the performance will be as tested. It's not a VPS. Their customers run production systems. They are all running a business of some sort.
  • Pricing is designed to be attractive for startups, but not artificially low to over-subscribe.
  • Currently there's no data center API. It's scriptable from the CLI. Smart Appliances can package up a data center operations into a drag and drop package. You can drag them into any application. Their first Smart Appliance is "follow me" which can move your application to a data center that is close to you. If you are in Asia you can move your data center to Asia. So your data center can follow you around. No coding is needed on your part. Just drag it into your VDC.
  • With AppLogic instead of managing a bunch of different things you manage your application. You do it once. AppLogic maintains the infrastructure for you.
  • In an upgrade of 10 Apache servers you don't upgrade standing infrastructure. You take a copy of your application and upgrade the copy.
  • Let's say you have an Apache server you want to patch. You create one prototype, which they call a class volume. Then when application restarts all the new changes will be picked up everywhere.
  • The power of what it means to be virtual can be seen in their rollback model. You don't upgrade in-place. You upgrade a copy. Because everything is virtual its easy to make copies of your entire data center. So you can copy your data center, keep the original running, and switch to the upgraded version. If the upgrade version doesn't work you can rollback to the original version of your VDC. This would be almost impossible using traditional methods. An application is the full state of the application with all its data. So you are operating a full complete copy of the application with all of its data. You can rollback to a complete running instance of the application. You just restart the old version.
  • For upgrades that require transformations, like database upgrades, you can write a script to run a database transformation.
  • They don't over automate. They don't only want to have their way of doing things.
  • The model an application has having two parts: the appliance and the content. For a web server this means: - Web Server - Content that it's serving.
  • You first create a prototype of what you want your system to look like. This becomes a class from which you later can create instances. There are templates, like the Linux appliance, to build from. Through their on-line system you configure your system, install packages, etc. When it works the way you want you can drag into your catalog as a template for building new instances. You can create hundreds of copies if you choose.
  • Content would be served off a mount location from inside the VDC.
  • You can upgrade the catalog element and restart the appliance and it will automatically upgrade for you. Not transactional. It's only an individual basis.
  • You can pin machines. You can get the environment to make machine specific configurations. You can put appliances into standby so you can quickly add additional resources on demand.
  • Their load balancer is Pound. No spam detection, but it is session aware. You can use others if you want.
  • They specialize in the code that runs the grid. They aren't specialists in load balancers and routers, etc.
  • In the VDC you can share infrastructure. You can email each other a clustered database, for example. You can save and package up an integration effort as an assembly. Save it. Sell it. Share it.
  • You can create an active-active redundancy scheme and pay for only resources you need because you can bring on resources like the front-end when you need them.
  • Many companies periodically make a local copy of their VDC and move it to their disaster center. - Remember, with a VDC it's easy to pick up your whole data center and move it somewhere else. The catalog doesn't have to be copied each time. Just the data for applications can be copied over. Not so bad with a fast backbone. - Disaster recovery can be triggered by a 3rd party or scripts. - This model is sufficient for companies that can accept some down time. - If no data loss can be tolerated you need replicate in an acive-active architecture.
  • Some companies maintain fungible data centers. They constantly copy their data center over to backup locations. If an app goes down they can fire up a replacement.
  • With AppLogic you can create a stub that can start an application on demand if it's not already running. This allows you to share resources. You can shut it down at night and save those resources for other applications.
  • Here's how they would handle TechCrunch: - Let's say you have an 8 grid data center. Let's say your normal load takes 20-30% of that. - First thing you'll do is use more resources from within the grid. - Then reconfigure appliances with more resources and restart them. - Then call your provider to add more resources. - Softlayer, for example, has a 500-1000 server inventory. So you can add servers to your grid within an hour a two. Currently this process requires human intervention.
  • Finding good OPs people is difficult. So with the VDC you can automate most of it and you don't need a big OPs team.
  • In you VDC your data center configuration is in the meta data, so its not kept as tacit knowledge. One or two people can run a thousand servers because you aren't really running servers, you are running applications.
  • Monitoring - AppLogic in control of all resources. You can build dashboards right off the bat. - You van plug your monitored variables into their monitoring system. - The data are available over the web. - Widgets are available for the display of live stats.
  • Different Way of Thinking about Your System - Typically you put the database on fastest server. Instead, they recommend allocating high end machines to everything so your database can run anywhere. A different way of thinking about your system. - Same with SAN. You don't need a SAN with the storage in the VDC. You are locking yourself into certain ways of thinking that don't apply in the VDC. Concept of using a SAN is just another lock-in.

    Some Observations and Conclusions

  • I think the grid/virtualization approach, in one form or another, is the wave of the future. It simply makes it easier for companies to scale applications. And as applications themselves are structured to run natively on a grid, it will become even easier.
  • Reaching the full potential of the virtual data center depends on having a more granular billing strategy and more fine grained control over resource management. For example, if I have 6 CPU grid and I want to upgrade. I don't want to pay for a 12 CPU data center just so I can upgrade a copy. I don't need 12 normally, I just transiently need 12. So during my upgrade I want my script to trigger allocating a copy of my VDC, do the upgrade, switch to it, and then decommission the old VDC. So I want for a time to have 6 extra servers for the time it takes to upgrade. Then the old VDC should go away. And I should only be billed for the resources I am using while I am using them. This would also give a more satisfactory solution to the TechCrunch scenario.
  • You need to architect your system to take advantage of the grid. To me this means a shared nothing architecture that can be grown horizontally by adding more machines on demand. Applications should read their configuration off shared storage so the configuration doesn't need to be configured on each machine and you can bring up new machines based on a template. If you need to scale a new machine should come up and automatically start handling load. Queuing architectures, for example, have this attribute.
  • They need a data center API so you can treat the data center like an object. This would allow you to orchestrate various data centers around the world as a single coopering unit.
  • Operations within a grid would benefit from standardization. I know this enters the application realm, but operations like upgrade and failover are common and hard. So it would be useful of common processes could be developed and easily deployed.
  • They need turnkey options for those new to the game. As it stands the path from signing up to their service and deploying a web service is little scary. They are very honest in saying they do only one part of the overall picture. But many people need a painting, not a brush and paint. It would be helpful to have out of the box plans for solving the most common problems people face. I would like to thank Bert again for taking this time for the interview! May the grid be with you, always.

    Related Sites and Articles

  • http://www.3tera.com/
  • On-Demand Infinitely Scalable Database Seed the Amazon EC2 Cloud

    Click to read more ...

  • Saturday
    Oct202007

    Strategy: Send XHR Request on Lost Focus Instead of For Every Character

    Robert Stewart shared this useful Ajax related scalability strategy: We avoided XMLHttpRequests for individual keystrokes, choosing to go back to the server only when a field lost focus. Google can afford all the servers to handle the load for that, but we didn't want to. Do you have a scalability strategy to share? Then share it!.

    Click to read more ...

    Thursday
    Oct182007

    another approach to replication

    File replication based on erasure codes can reduce total replicas size 2 times and more.

    Click to read more ...

    Tuesday
    Oct162007

    How Scalable are Single Page Ajax Apps?

    I've been using GWT for an application and I get the same feeling using it that I first got using html. I've always sucked at building UIs. Starting with programming HP terminals, moving on to the Apple Lisa, then X Windows, and Microsoft Windows, I just never had IT, whatever IT is. On the Beauty and the Geek scale my interfaces are definitely horned-rimmed and pocket protector friendly. Html helped free me from all that to just build stuff that worked, but didn't have to look all that great. Expectations were pretty low and I eagerly fulfilled them. With Ajax expectations have risen again and I find myself once more easily identifiable as a styless geek. Using GWT I have some hopes I can suck a little less. In working with GWT I was so focussed on its tasty easily digestible Ajaxy goodness, I didn't stop to think about the topic of this site: scalability. When I finally brought my distracted mind around to consider the scalability of the single page webs site I was building, I became a bit concerned. Many of the strategies that are typically used to achieve scalability don't seem to apply in single page land. Here are the issues I see. Maybe you can tell me where I am off in my analysis?

  • Plus: a lot of state is maintained in the client. You don't need to keep session state on the server side. This is a win because you aren't slamming the database to reconstitute state. It's cached on the client. After more consideration it seems this is not always the case. Take your typical shopping cart scenario. You have the old problem of not storing prices in the client so some evil Mallory can attack your system by changing prices. And my shopping cart must outlast my browser session so its still there when I return. I would be heart broken if my carefully crafted Amazon cart disappeared every time Firefox went away. So server side state is often still necessary. Yet a lot of state is kept on the client side and that's a better thing.
  • Plus: a lot of business logic in on the client. The client can do a lot of the work which saves making calls to the server. An interesting comparison of the effects of Ajax on business logic partitioning is Google Calendar: Not As Fat as Other Ajax Apps by Dietrich Kappe.
  • Minus: Can't offload searching. The lack of a proper link structure means your site can't be spidered, which means it can't be searched. One useful scalability strategy is to offload search to something like Google's Custom Search Engine, not for the ad revenue (because there's little), but because it means I don't have to devote any resources to searching. That's a huge win.
  • Minus: SEO problems suck up developer time. The common response to the previously mentioned search engine optimization (SEO) problems are to make a shadow text site or insert hidden divs. But that's a lot of pretty useless effort. I would like to spend my time elsewhere.
  • Minus: Can't load balance static content from the client. RPC is used to slurp up data from the server and these requests must go back to the originating domain. This counters one common strategy of using a CDN and/or multiple host names for serving content so you can trick your browser into starting multiple simultaneous connections to different hosts when loading page content. This speeds up your site and spreads the load across different servers. Using RPC to serve content seems to lose this advantage.
  • Minus: Ajax calls add server load. You buy into that with Ajax, but it's still a concern, especially if you have to poll frequently for updates. Dietrich found that the Ajax requests may not be that much smaller than before, so you can't depend on smaller work loads to make up for the increased number of calls. See Yahoo Mail, Ajax and Your Server.
  • Minus: Lack of monetary scalability with AdSense. Without a page to parse AdSense can't figure out which ads to display on your site. So one common monetization strategy isn't open to you.
  • Unsure: When using a caching proxy like Squid, a major scalability strategy, is my cacheable content effectively cached when using RPC? I couldn't find a resolution to this issue. One solution around many of these problems is to use a combination of REST and JASONP. This converts your client into a big mashup, even if all the parts you are mashing are your own. And this approach makes a lot of sense to me, but then I don't really see the purpose of having a RPC mechanism. There are surely issues I've missed and misunderstood, but it seems single page apps present some distinct scalability challenges. Your thoughts would be appreciated.

    Click to read more ...

  • Monday
    Oct152007

    Olympic Site Architecture

    Hello,everybody,I'm plant to building a new website like 2008.sina.com.cn 2008.sohu.com .site contents have pic news,text news,and video news.user blog ....now I have a question to ask everybody,I hope can get usefully information to here. status: 100,000,000 people /per day 50,000 people /peak time more than 200 servers OpenBSD/Opensuse Apache Fast CGI modules lighttpd for picture Mysql varnish LVS lucene search do you have a good idea to it?thans for everybody!

    Click to read more ...

    Sunday
    Oct142007

    Newbie in scalability design issues

    I have 3 exp in building website using java.I work on only single server.Website is not very scalable.I always wonder how ebay,youtube,monster handle traffic, giving responses within seconds.From the google i find this site and i hope i can also able to build very scalable website .I need guidelines from where to start ,what are the things we needed.I know that scalability comes through the use of distributed applications but don't how to implement it. I see many website build in languages other than java so java is good choice for building high scalable website. Thanks

    Click to read more ...

    Sunday
    Oct142007

    Product: The Spread Toolkit

    Complex applications coordinating work across a lot of machines often need a highly performing fault tolerant message layer. Though a blast to write, it's probably a better use of your time to use an off the shelf solution. And that's where Spread comes in. Flickr, for example, uses Spread to create real-time event feeds from their web server logs. What exactly is Spread? From the Spread website:

    Spread is an open source toolkit that provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. Spread services range from reliable messaging to fully ordered messages with delivery guarantees. Spread can be used in many distributed applications that require high reliability, high performance, and robust communication among various subsets of members. The toolkit is designed to encapsulate the challenging aspects of asynchronous networks and enable the construction of reliable and scalable distributed applications. Some of the services and benefits provided by Spread:
  • Reliable and scalable messaging and group communication.
  • A very powerful but simple API simplifies the construction of distributed architectures.
  • Easy to use, deploy and maintain.
  • Highly scalable from one local area network to complex wide area networks.
  • Supports thousands of groups with different sets of members.
  • Enables message reliability in the presence of machine failures, process crashes and recoveries, and network partitions and merges.
  • Provides a range of reliability, ordering and stability guarantees for messages.
  • Emphasis on robustness and high performance.
  • Completely distributed algorithms with no central point of failure.
  • In Building Scalable Web Sites Cal Henderson describes how Flickr uses Spread to create a log of real-time events, like photos uploaded and discussions started, as they happen. Spread is connected to their web servers. As photos are uploaded these web server events are messaged in real-time to agents consuming the feed. The advantage of this architecture is it sheds load away from the database. Otherwise the database would have to be continuously polled for new events by each agent.

    Related Articles

  • LAMP and the Spread Toolkit
  • The Spread Toolkit: Architecture and Performance

    Click to read more ...

  • Thursday
    Oct112007

    How Flickr Handles Moving You to Another Shard

    Colin Charles has cool picture showing Flickr's message telling him they'll need about 15 minutes to move his 11,500 images to another shard. One, that's a lot of pictures! Two, it just goes to show you don't have to make this stuff complicated. Sure, it might be nice if their infrastructure could auto-balance shards with no down time and no loss of performance, but do you really need to go to all the extra complexity? The manual system works and though Colin would probably like his service to have been up, I am sure his day will still be a pleasant one.

    Click to read more ...

    Wednesday
    Oct102007

    WAN Accelerate Your Way to Lightening Fast Transfers Between Data Centers

    How do you keep in sync a crescendo of data between data centers over a slow WAN? That's the question Alberto posted a few weeks ago. Normally I'm not into all boy bands, but I was frustrated there wasn't a really good answer for his problem. It occurred to me later a WAN accelerator might help turn his slow WAN link into more of a LAN, so the overhead of copying files across the WAN wouldn't be so limiting. Many might not consider a WAN accelerator in this situation, but since my friend Damon Ennis works at the WAN accelerator vendor Silver Peak, I thought I would ask him if their product would help. Not surprisingly his answer is yes! Potentially a lot, depending on the nature of your data. Here's a no BS overview of their product:

  • What is it? - Scalable WAN Accelerator from Silver Peak (http://www.silver-peak.com)
  • What does it do? - You can send 5x-100x times more data across your expensive, low-bandwidth WAN link.
  • Why should you care? - Your data centers become more like co-located real-time peers. - You can sync a lot more media and other large files across data centers. 50x improvement in data replication performance over a WAN. - You may be able to operate on remote database more like a local database. 5x-20x improvement is SQL data manipulation and unique query performance. - A 2 hour database backup would take 4 minutes. 10x-30x improvement in transferring large data sets over SQL. A good disaster planning feature.
  • How does it work? - You buy an accelerator appliance for both sides of you link. All your WAN traffic flows through these boxes. - The appliances then use various techniques to effectively decrease latency and increase bandwidth across the link: -- Traffic reduction. Accelerators look for patterns in data across a link, caching the data on either side of the link, and then not sending the data when similar patterns are seen again. This can lead to a 90% reduction in traffic. -- Compression. Data are compressed across the link. compression ratios from 0 to 2-5x are seen, depending on the content type. -- TCP Manipulation. The TCP/IP protocol is gamed to yield better performance. For example, a proxy on both sides is used to get a bigger window size. -- Application Manipulation. Various application protocols, like CIFS, NFS, and Outlook, can be gamed to improve performance.
  • How much does it cost? - $10k to $130k per box. $10k for the 2Mbps appliance and $130k for the 500Mbps. - They are the scale leaders and are specifically good at "high-end" (> 50Mbps) replication.
  • Who uses it? - Fidelity Bank, Ernst & Young, Panasonic.
  • Is it for real? - Yes. It works and is installed and running in many data centers.
  • How do you get it? - Contact sales at http://www.silver-peak.com/Contact/contact.asp.
  • Where do you go for more information? - White paper Directory - http://www.silver-peak.com/InfoCenter/index.htm#whitepapers - Understanding WAN Acceleration Techniques - http://www.silver-peak.com/assets/download/pdf/technologydescriptions.pdf
  • Is there anything else interesting you should know? - The appliance performs encryption and compression so you don't need perform those functions on your own CPUs. - The appliances fail to wire so if a box fails traffic passes unaccelerated. If you can't live with that you need to buy 2 boxes per end of the link (4 boxes total).
  • How much will you benefit? - The more duplication in your data the better job they can do. There's tons of duplicated data in a database feed , for example, so they can really help supercharge database performance. - Latency/time improvements depend on the link. The higher the latency the link has the less bandwidth you can use. For example, a 100ms link is limited to 5Mbps throughput per flow due to the TCP window size (64KB/100ms ~ 5Mbps). They can take this to several hundred Mbps per flow. - Image files are often pre-compressed. As compression removes duplicate information they can't be as efficient at the de-duplication as in other scenarios, though they can still improve throughput. An interesting side-effect of speeding up the WAN link is that it often reveals bottlenecks in other parts of the system. A slow WAN might be hiding:
  • Underpowered servers. Servers that could process a trickle of data may be overwhelmed by a flood of data.
  • Slow applications. Apps that could pump data at slow WAN speeds may not be able drive a faster WAN. You may need to take a look at your software architecture or storage network.
  • Underpowered server links. Accelerate a 2mbps link to a 20mbps link and your network infrastructure on the data center side may not be able to handle the truth. Obviously the cost of the solution means its targeted more for moderate sized companies or a service provider offering their customers a quality upsell. But if you are stuck wondering how the heck you are going to squeeze more bits between your data centers, it may be just the magic bullet you need.

    Click to read more ...