Google Talk is Google's instant communications service. Interestingly the IM messages aren't the major architectural challenge, handling user presence indications dominate the design. They also have the challenge of handling small low latency messages and integrating with many other systems. How do they do it? Site: http://www.google.com/talk
The Architecture* Data Center. * Storage. * Development Environment. * OS. * Web Server. * Database. * Database abstraction layer. * Load balancing. * Web Framework. * Real-time messaging. * Identity management. * Distributed job management. * Ad serving. * Standard API to website. * AJAX library. * PHP Cache. * Object and Content Cache. * Client Side Cache. * Monitoring. * Log Analysis. * Testing. * Performance Analysis. * Backup and Restore. * Fault Tolerance. * Scalability Plan. * Business Continuity Plan. * Future Directions.
Lessons LearnedTo discuss this article please visit the forums at
As users come to depend on MySQL, they find that they have to deal with issues of reliability, scalability, and performance--issues that are not well documented but are critical to a smoothly functioning site. This book is an insider's guide to these little understood topics. Author Jeremy Zawodny has managed large numbers of MySQL servers for mission-critical work at Yahoo!, maintained years of contacts with the MySQL AB team, and presents regularly at conferences. Jeremy and Derek have spent months experimenting, interviewing major users of MySQL, talking to MySQL AB, benchmarking, and writing some of their own tools in order to produce the information in this book. In High Performance MySQL you will learn about MySQL indexing and optimization in depth so you can make better use of these key features. You will learn practical replication, backup, and load-balancing strategies with information that goes beyond available tools to discuss their effects in real-life environments. And you'll learn the supporting techniques you need to carry out these tasks, including advanced configuration, benchmarking, and investigating logs. Topics include: * A review of configuration and setup options * Storage engines and table types * Benchmarking * Indexes * Query Optimization * Application Design * Server Performance * Replication * Load-balancing * Backup and Recovery * Security
This paper is behind a registration-wall, you can't do anything on the MySQL site without filling out a form of some kind, but it's a short, decent introduction to using MySQL for a good sized website.
A Quick Hit of What's InsideScale-out vs. Scale Up, Customers using MySQL, Scale-Out Reference Architecture
Eventually every database system hit its limits. Especially on the Internet, where you have millions of users which theoretically access your database simultaneously, eventually your IO system will be a bottleneck. [A] promising but more complex solution with nearly no scale-out limits is application partitioning. If and when you get into the top-1000 rank on alexa , you have to think about such solutions.
A Quick Hit of What's InsideHorizontal application partitioning, Vertical application partitioning, Disk IO calculations, How to partition an entity
If the clustered file system, clustered storage system, storage virtualization movement is new to you then this is a good intro paper. I's a both vendor puff piece and informative, so it might be worth your time.
A Quick Hit of What's InsideClustered storage architectures have the ability to pull together two or more storage devices to behave as a single entity. Clustered storage can be broken down into three types: * 2-way simple failover clustering * Namespace aggregation * Clustered storage with a distributed file systems (DFS)
Follow this blog and you'll learn a lot about MySQL and how to make it sing.
A Quick Hit of What's InsideWorking with large data sets in MySQL, PHP Large result sets and summary tables, Implementing efficient counters with MySQL.
Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers are removed, and all variants guarantee that no two replicas of a particular object are ever placed on the same server. Because there is no central directory, clients can compute data locations in parallel, allowing thousands of clients to access objects on thousands of servers simultaneously.
Web Analytics: An Hour A Day is the first book by an in-the-trenches practitioner of web analytics. It provides a unique insider’s perspective of the challenges and opportunities that web analytics presents to each person who touches the Web in your organization. Rather than spamming you with metrics and definitions, Web Analytics: An Hour A Day will enhance your mindset and teach you how to fish for yourself. Avinash Kaushik is a expert in web analytics and author of the top-rated blog Occam’s Razor (http://www.kaushik.net/avinash). In this book, he goes beyond web analytics concepts and definitions to provide a step-by-step guide to implementing a successful web analytics strategy. His revolutionary approach to web analytics challenges prevalent thinking about the field and guides readers to a solution that will provide truly informed and actionable insights.