advertise
Monday
Jul162007

Blog: MySQL Performance Blog - Everything about MySQL Performance. 

Follow this blog and you'll learn a lot about MySQL and how to make it sing.

A Quick Hit of What's Inside

Working with large data sets in MySQL, PHP Large result sets and summary tables, Implementing efficient counters with MySQL.

Click to read more ...

Monday
Jul162007

Paper: Replication Under Scalable Hashing

Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers are removed, and all variants guarantee that no two replicas of a particular object are ever placed on the same server. Because there is no central directory, clients can compute data locations in parallel, allowing thousands of clients to access objects on thousands of servers simultaneously.

Click to read more ...

Sunday
Jul152007

Web Analytics: An Hour a Day

Web Analytics: An Hour A Day is the first book by an in-the-trenches practitioner of web analytics. It provides a unique insider’s perspective of the challenges and opportunities that web analytics presents to each person who touches the Web in your organization. Rather than spamming you with metrics and definitions, Web Analytics: An Hour A Day will enhance your mindset and teach you how to fish for yourself. Avinash Kaushik is a expert in web analytics and author of the top-rated blog Occam’s Razor (http://www.kaushik.net/avinash). In this book, he goes beyond web analytics concepts and definitions to provide a step-by-step guide to implementing a successful web analytics strategy. His revolutionary approach to web analytics challenges prevalent thinking about the field and guides readers to a solution that will provide truly informed and actionable insights.

Click to read more ...

Sunday
Jul152007

Book: Building Scalable Web Sites

Building, scaling, and optimizing the next generation of web applications. Learn the tricks of the trade so you can build and architect applications that scale quickly--without all the high-priced headaches and service-level agreements associated with enterprise app servers and proprietary programming and database products. Culled from the experience of the Flickr.com lead developer, Building Scalable Web Sites offers techniques for creating fast sites that your visitors will find a pleasure to use. Creating popular sites requires much more than fast hardware with lots of memory and hard drive space. It requires thinking about how to grow over time, how to make the same resources accessible to audiences with different expectations, and how to have a team of developers work on a site without creating new problems for visitors and for each other. Presenting information to visitors from all over the world * Integrating email with your web applications * Planning hardware purchases and hosting options to have as much as you need without breaking your wallet * Partitioning and distributing databases to support large datasets and simultaneous transactions * Monitoring your applications to find and clear bottlenecks * Providing services APIs and using services from other providers to increase your site's reach and capabilities Whether you're starting a small web site with hopes of growing big or you already have a large system that needs maintenance, you'll find Building

Click to read more ...

Sunday
Jul152007

Blog: Occam’s Razor by Avinash Kaushik

Author of Web Analytics An Hour of Day. Has a fresh and practical take on unlocking the power of web research and web analytics to create truly data driven organizations for gaining a strategic competitive advantage.

A Quick Hit of What's Inside

Find You Web Analytics Soul Mate (How To Run An Effective Tool Pilot), AK’s Web Analytics Tool Evaluation “Tips From A Tough Life”, Web Analytics Data Sampling 411, Six Data Visualizations That Rock!, Why “looking beyond the click” to optimize the experience is so necessary.

Click to read more ...

Sunday
Jul152007

Isilon Clustred Storage System

The Isilon IQ family of clustered storage systems was designed from the ground up to meet the needs of data-intensive enterprises and high-performance computing environments. By combining Isilon's OneFS® operating system software with the latest advances in industry-standard hardware, Isilon delivers modular, pay-as-you-grow, enterprise-class clustered storage systems. OneFS, with TrueScale™ technology, powers the industry's first and only storage system that enables linear or independent scaling of performance and capacity. This new flexible and tunable system, featuring a robust suite of clustered storage software applications, provides customers with an "out of the box" solution that is fully optimized for the widest range of applications and workflow needs. * Scales from 4 TB ti 1 PB * Throughput of up to 10 GB per seond * Linear scaling * Easy to manage

Related Articles

  • Inside Skinny On Isilon by StorageMojo

    Click to read more ...

  • Sunday
    Jul152007

    Lustre cluster file system

    Lustre® is a scalable, secure, robust, highly-available cluster file system. It is designed, developed and maintained by Cluster File Systems, Inc. The central goal is the development of a next-generation cluster file system which can serve clusters with 10,000's of nodes, provide petabytes of storage, and move 100's of GB/sec with state-of-the-art security and management infrastructure. Lustre runs on many of the largest Linux clusters in the world, and is included by CFS's partners as a core component of their cluster offering (examples include HP StorageWorks SFS, and the Cray XT3 and XD1 supercomputers). Today's users have also demonstrated that Lustre scales down as well as it scales up, and runs in production on clusters as small as 4 and as large as 25,000 nodes. The latest version of Lustre is always available from Cluster File Systems, Inc. Public Open Source releases of Lustre are available under the GNU General Public License. These releases are found here, and are used in production supercomputing environments worldwide.

    Other Links

    * http://www.clusterfs.com/

    Click to read more ...

    Sunday
    Jul152007

    Coyote Point Load Balancing Systems

    Appliances that: * Ensures Non-Stop application availability * Improves network and server maintainability * Delivers Enterprise-grade gigabit content switching * Offers true Application Acceleration * Provides maximum throughput at minimal cost

    Click to read more ...

    Thursday
    Jul122007

    FeedBurner Architecture

    FeedBurner is a news feed management provider launched in 2004. FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. Services provided to publishers include traffic analysis and an optional advertising system. Site: http://www.feedburner.com

    Information Sources

  • FeedBurner - Scalable Web Applications using MySQL and Java
  • What the Web’s most popular sites are running on

    Platform

  • Java
  • MySQL
  • Hibernate
  • Spring
  • Tomcat
  • Cacti
  • Load balancing: NetScaler Application Switches
  • Routers, switches: HP, Cisco
  • DNS: bind

    The Stats

  • FeedBurner is growing faster than MySpace and Digg with 385% traffic growth. Total feeds: 808,707, Number of publishers: 471,686.
  • 11 million subscribers in 190 countries
  • Scaling History - July 2004: 300Kbps, 5,600 feeds, 3 app servers, 3 web servers 2 DB servers, Round Robin DNS - April 2005: 5Mbps, 47,700 feeds, 6 app servers, 6 web servers (same machines) - September 2005: 20Mbps, 109,200 feeds - Currently: 250 Mbps bandwidth usage, 310 million feed views per day, 100 Million hits per day

    The Architecture

  • Scalability Problem 1: Plain old reliability - Single-server failure, seen by 1/3 of all users - Health Check all the way back to the database that is monitored by load balancers to route requests in to live machines on failure. - Use Cacti and Nagios for monitoring. Using these tools you can look at uptime and performance to identify performance problems.
  • Scalability Problem 2: Stats recording/mgmt - Every hit is recorded which slows everything down because of table level locks. - Used Doug Lea’s concurrency library to do updates in multiple threads. - Only stats for today are calculated in real-time. Other stats are calculate lazily.
  • Scalability Problem 3: Primary DB overload - Use master DB for everything. - Balance read and read/write load - Found where we could break up read vs. read/write - Balanced master vs. slave load
  • Scalability Problem 4: Total DB overload - Everything slowed down, was using the database has cache, used MyISAM - Add caching layers. RAM on the machines, memcached, and in the database
  • Scalability Problem 5: Lazy initialization - When stats get rolled up on demand popular feeds slowed down the whol system - Turned to batch processing, doing the rollups once a night.
  • Scalability Problem 6: Stats writes, again - Wrote to the master too much. More data with each feed. Added more stats tracking for ads, items, and circulation. - Use merge tables. Truncate the data from 2 days ago. - Went to horizontal partitioning: ad serving, flare serving, circulation. - Move hottest tables/queries to own clusters.
  • Scalability Problem 7: Master DB Failure - Using a primary and slave there's a single point of failure because it's hard to promote a slave to a master. Went to a multi master solution.
  • Scalability Problem 8: Power Failure - Needed a disaster recovery/secondary site. - Active/active not possible. Too much hardware, didn't like having half the hardware going to waste, and needed a really fast connection between data centers. - Create custom solution to download feeds to remote servers.
  • They have two sites in primary and secondary roles (active-passive) as their geographical redundancy plan. They plan on moving to active-active model in the future.

    Lessons Learned

  • Know your DB workload, Cacti really helps with this.
  • ‘EXPLAIN’ all of your queries. Helps keep crushing queries out of the system.
  • Cache everything that you can.
  • Profile your code, usually only needed on hard-to-find leaks.
  • The greatest challenge was finding the most efficient ways to locate hotspots and bottlenecks in the application. With a loose methodology for locating problems, the analysis became very easy. Detailed monitoring was crucial in this, keeping track of disk, CPU and memory usage, slow database queries, handler details in MySQL, etc.

    Click to read more ...

  • Thursday
    Jul122007

    Should I use LAMP or Windows?

    Hi, I stumbled on your site and I am thinking about starting a website. I haven't received a good answer about what I should use to build it, so I thought I would give it a shot. I am a windows guy. I know .Net and ASP and how to build web sites using that stack. But I notice most sites use LAMP and that's what most people talk about using. What's wrong with using Windows? .Net Programmer

    Click to read more ...