Update 10: The Value of CDNs by Mike Axelrod of Google. Google implements a distributed content cache from within large ISPs. This allows them to serve content from the edge of the network and save bandwidth on the ISPs backbone. Update 9: Just Jump: Start using Clouds and CDNs. Bob Buffone gives a really nice and practical tutorial of how to use CloudFront as your CDN. Update 8: Akamai’s Services Become Affordable for Anyone! Blazing Web Site Performance by Distribution Cloud. Distribution Cloud starts at $150 per month for access to the best content distribution network in the world and the leader of Content Distribution Networks. Update 7: Where Amazon’s Data Centers Are Located, Expanding the Cloud: Amazon CloudFront. Why Amazon's CDN Offering Is No Threat To Akamai, Limelight or CDN Pricing. Amazon has launched their CDN with "“low latency, high data transfer speeds, and no commitments.” The perfect relationship for many. The majority of the locations are in North America, but some are in Europe and Asia. Update 6: Amazon Launching New Content Delivery Network: No Threat To Major CDNs, Yet. All the Amazon will kill all other CDNs is a bit overblown. As usual Dan Rayburn sets us straight: The offering won't support streaming, live broadcasting, or provide many of the other products and services that video content owners need...the real story here is that Amazon is going to offer a high performance method of distributing content with low latency and high data transfer rates. Update 5: When It Comes To Content Delivery Networks, What Is The "Edge"?. Dan Rayburn is on edge about the misuse of the term edge: closest location to the user does not guarantee quality, often content is not delivered from the closest location, all content is not replicated at every "edge" location. Lots of other essential information. Update 4: David Cancel runs a great test to see if you should be Using Amazon S3 as a CDN?. Conclusion: "CacheFly performed the best but only slightly better than EdgeCast. The S3 option was the worst with the Nginx/DIY option performing just over 100 ms faster." Also take look at Part 2 - Cacheability? Update 3: Mr. Rayburn takes A Detailed Look At Akamai's Application Delivery Product . They create a "bi-nodal overlay network" where users and servers are always within 5 to 10 milliseconds of each other. Your data center hosted app can't compete. The problem is that people (that is, me) can understand the data center model. I don't yet understand how applications as a CDN will work. Update 2: Dan Rayburn starts an interesting series of articles on Highlights Of My Day In Cambridge With Akamai. Akamai is moving strong into the application distribution business. That would make an interesting cloud alternative.. Update: Streamingmedia links to new CDN DF Splash that specializes in instant-on TV-quality video streaming. A question was raised on the forum asking for a CDN recommendation. As usual there are no definitive answers, but here are three useful articles that may help your deliberations.
Edge Acceleration Strategies: Akamai by Tony ChangThe edge network is the "network physically closest to the end user and the 'origin' is where the application(s) is hosted." Tony talks about how you use CDNs to manage the user experience through meeting millisecond+ level SLAs using edge acceleration services. He does this in an interesting way. He follows a request through its life cycle and shows how to turn your caterpillar into a butterfly at each stage:
Video on Content Delivery Network Pricing, Costs for Outsourced Video Delivery by Dan RayburnAlso CDN Pricing Data: Average Cost Per GB Declines In Q4 Due To Startups. It's evident Dan really knows his stuff. His articles and presentations are highly educational for anyone interested in the complex and confusing CDN world. Dan sees hundreds of real-life customer-CDN vendor contracts a year and he reports on real prices averaged over all the contracts he has seen. One of the hardest things as a consumer is knowing what a good price is for your basket of goods and Dan gives you the edge, so to speak. What I learned:
Memcached is generally treated as a black box. But what if you really need to know what's in there? Not for runtime purposes, but for optimization and capacity planning?
Read more on Evan Weaver, a software developer working for Twitter (a contributor for Rails core and Mongrel).
Update 3: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. Update 2: H-Store: A Next Generation OLTP DBMS is the project implementing the ideas in this paper: The goal of the H-Store project is to investigate how these architectural and application shifts affect the performance of OLTP databases, and to study what performance benefits would be possible with a complete redesign of OLTP systems in light of these trends. Our early results show that a simple prototype built from scratch using modern assumptions can outperform current commercial DBMS offerings by around a factor of 80 on OLTP workloads. Update: interesting related thread on Lamda the Ultimate. A really fascinating paper bolstering many of the anti-RDBMS threads the have popped up on the intertube lately. The spirit of the paper is found in the following excerpt: In summary, the current RDBMSs were architected for the business data processing market in a time of different user interfaces and different hardware characteristics. Hence, they all include the following System R architectural features: * Disk oriented storage and indexing structures * Multithreading to hide latency * Locking-based concurrency control mechanisms * Log-based recovery Of course, there have been some extensions over the years, including support for compression, shared-disk architectures, bitmap indexes, support for user-defined data types and operators, etc. However, no system has had a complete redesign since its inception. This paper argues that the time has come for a complete rewrite. Of particular interest the discussion of H-store, which seems like a nice database for the data center. H-Store runs on a grid of computers. All objects are partitioned over the nodes of the grid. Like C-Store [SAB+05], the user can specify the level of K-safety that he wishes to have. At each site in the grid, rows of tables are placed contiguously in main memory, with conventional B-tree indexing. B-tree block size is tuned to the width of an L2 cache line on the machine being used. Although conventional B-trees can be beaten by cache conscious variations [RR99, RR00], we feel that this is an optimization to be performed only if indexing code ends up being a significant performance bottleneck. Every H-Store site is single threaded, and performs incoming SQL commands to completion, without interruption. Each site is decomposed into a number of logical sites, one for each available core. Each logical site is considered an independent physical site, with its own indexes and tuple storage. Main memory on the physical site is partitioned among the logical sites. In this way, every logical site has a dedicated CPU and is single threaded. The paper goes through how databases should be written with modern CPU, memory, and network resources. It's a fun an interesting read. Well worth your time.
Does anyone know of any articles or papers that discuss the nuts and bolts of how web analytics is implemented at organizations with large volumes of web traffic and a critcal business need to analyze that data - e.g. places like Amazon.com, eBay, and Google? Just as a fun project I'm planning to build my own web log analysis app that can effectively index and query large volumes of web log data (i.e. TB range). But first I'd like to learn more about how it's done in the organizations whose lifeblood depends on this stuff. Even just a high level architectural overview of their approaches would be nice to have.
Hi, Some time ago , martin fowler bloged about how HTTP cache headers can be very effectively used in web site design. http://www.martinfowler.com/bliki/SegmentationByFreshness.html How actively HTTP cache headers are considered in web site design? I think it is a great tool to reduce lot of load on server and should be considered before designing any complex caching strategy. Thoughts? Thanks, Unmesh
I found this resources: High Scalable Architecture: - YouTube Architecture - Facebook Chat Architecture - Amazon Architecture Blogs: - Scalability Guidelines for building scalable software system (part 1) - Scalability Guidelines for building scalable software system (part 2) - Scalability Guidelines for building scalable software system (part 3) - Scalability Worst Practices - how to minimize load time for fast user experiences - Scalability principles - Challanges for Developing Enterprise Application on the Cloud - high-performance web page real-world examples netflix case study - Intro to Caching,Caching algorithms and caching frameworks part 1 - Amdahl’s low - How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale - Top 25 Most Dangerous Programming Mistakes
There were many talks recently about twitter scalability and their specific choice of language such as Scala to address their existing Ruby based scalability. In this post i tried to provide a more methodical approach for handling twitter scalability challenges that is centered around the right choice of architecture patterns rather then the language itself. The architecture pattern are given in a generic fashion that is not specific to twitter itself and can serve anyone who is looking to build a scalable real time web application in the near future.
This post I provided a summary of recent discussions outlining the main challenges that developers face today when deploying their existing JEE application to the cloud such as complexity, database integration, security, standard JEE support etc. In this post i also provided summary of how we managed to handle those challenges with our new Cloud Computing Framework by pointing to an existing production reference of a leading Telco provider.