advertise
Sunday
Jul152007

Isilon Clustred Storage System

The Isilon IQ family of clustered storage systems was designed from the ground up to meet the needs of data-intensive enterprises and high-performance computing environments. By combining Isilon's OneFS® operating system software with the latest advances in industry-standard hardware, Isilon delivers modular, pay-as-you-grow, enterprise-class clustered storage systems. OneFS, with TrueScale™ technology, powers the industry's first and only storage system that enables linear or independent scaling of performance and capacity. This new flexible and tunable system, featuring a robust suite of clustered storage software applications, provides customers with an "out of the box" solution that is fully optimized for the widest range of applications and workflow needs. * Scales from 4 TB ti 1 PB * Throughput of up to 10 GB per seond * Linear scaling * Easy to manage

Related Articles

  • Inside Skinny On Isilon by StorageMojo

    Click to read more ...

  • Sunday
    Jul152007

    Lustre cluster file system

    Lustre® is a scalable, secure, robust, highly-available cluster file system. It is designed, developed and maintained by Cluster File Systems, Inc. The central goal is the development of a next-generation cluster file system which can serve clusters with 10,000's of nodes, provide petabytes of storage, and move 100's of GB/sec with state-of-the-art security and management infrastructure. Lustre runs on many of the largest Linux clusters in the world, and is included by CFS's partners as a core component of their cluster offering (examples include HP StorageWorks SFS, and the Cray XT3 and XD1 supercomputers). Today's users have also demonstrated that Lustre scales down as well as it scales up, and runs in production on clusters as small as 4 and as large as 25,000 nodes. The latest version of Lustre is always available from Cluster File Systems, Inc. Public Open Source releases of Lustre are available under the GNU General Public License. These releases are found here, and are used in production supercomputing environments worldwide.

    Other Links

    * http://www.clusterfs.com/

    Click to read more ...

    Sunday
    Jul152007

    Coyote Point Load Balancing Systems

    Appliances that: * Ensures Non-Stop application availability * Improves network and server maintainability * Delivers Enterprise-grade gigabit content switching * Offers true Application Acceleration * Provides maximum throughput at minimal cost

    Click to read more ...

    Thursday
    Jul122007

    FeedBurner Architecture

    FeedBurner is a news feed management provider launched in 2004. FeedBurner provides custom RSS feeds and management tools to bloggers, podcasters, and other web-based content publishers. Services provided to publishers include traffic analysis and an optional advertising system. Site: http://www.feedburner.com

    Information Sources

  • FeedBurner - Scalable Web Applications using MySQL and Java
  • What the Web’s most popular sites are running on

    Platform

  • Java
  • MySQL
  • Hibernate
  • Spring
  • Tomcat
  • Cacti
  • Load balancing: NetScaler Application Switches
  • Routers, switches: HP, Cisco
  • DNS: bind

    The Stats

  • FeedBurner is growing faster than MySpace and Digg with 385% traffic growth. Total feeds: 808,707, Number of publishers: 471,686.
  • 11 million subscribers in 190 countries
  • Scaling History - July 2004: 300Kbps, 5,600 feeds, 3 app servers, 3 web servers 2 DB servers, Round Robin DNS - April 2005: 5Mbps, 47,700 feeds, 6 app servers, 6 web servers (same machines) - September 2005: 20Mbps, 109,200 feeds - Currently: 250 Mbps bandwidth usage, 310 million feed views per day, 100 Million hits per day

    The Architecture

  • Scalability Problem 1: Plain old reliability - Single-server failure, seen by 1/3 of all users - Health Check all the way back to the database that is monitored by load balancers to route requests in to live machines on failure. - Use Cacti and Nagios for monitoring. Using these tools you can look at uptime and performance to identify performance problems.
  • Scalability Problem 2: Stats recording/mgmt - Every hit is recorded which slows everything down because of table level locks. - Used Doug Lea’s concurrency library to do updates in multiple threads. - Only stats for today are calculated in real-time. Other stats are calculate lazily.
  • Scalability Problem 3: Primary DB overload - Use master DB for everything. - Balance read and read/write load - Found where we could break up read vs. read/write - Balanced master vs. slave load
  • Scalability Problem 4: Total DB overload - Everything slowed down, was using the database has cache, used MyISAM - Add caching layers. RAM on the machines, memcached, and in the database
  • Scalability Problem 5: Lazy initialization - When stats get rolled up on demand popular feeds slowed down the whol system - Turned to batch processing, doing the rollups once a night.
  • Scalability Problem 6: Stats writes, again - Wrote to the master too much. More data with each feed. Added more stats tracking for ads, items, and circulation. - Use merge tables. Truncate the data from 2 days ago. - Went to horizontal partitioning: ad serving, flare serving, circulation. - Move hottest tables/queries to own clusters.
  • Scalability Problem 7: Master DB Failure - Using a primary and slave there's a single point of failure because it's hard to promote a slave to a master. Went to a multi master solution.
  • Scalability Problem 8: Power Failure - Needed a disaster recovery/secondary site. - Active/active not possible. Too much hardware, didn't like having half the hardware going to waste, and needed a really fast connection between data centers. - Create custom solution to download feeds to remote servers.
  • They have two sites in primary and secondary roles (active-passive) as their geographical redundancy plan. They plan on moving to active-active model in the future.

    Lessons Learned

  • Know your DB workload, Cacti really helps with this.
  • ‘EXPLAIN’ all of your queries. Helps keep crushing queries out of the system.
  • Cache everything that you can.
  • Profile your code, usually only needed on hard-to-find leaks.
  • The greatest challenge was finding the most efficient ways to locate hotspots and bottlenecks in the application. With a loose methodology for locating problems, the analysis became very easy. Detailed monitoring was crucial in this, keeping track of disk, CPU and memory usage, slow database queries, handler details in MySQL, etc.

    Click to read more ...

  • Thursday
    Jul122007

    Should I use LAMP or Windows?

    Hi, I stumbled on your site and I am thinking about starting a website. I haven't received a good answer about what I should use to build it, so I thought I would give it a shot. I am a windows guy. I know .Net and ASP and how to build web sites using that stack. But I notice most sites use LAMP and that's what most people talk about using. What's wrong with using Windows? .Net Programmer

    Click to read more ...

    Wednesday
    Jul112007

    Friendster Architecture

    Friendster is one of the largest social network sites on the web. it emphasizes genuine friendships and the discovery of new people through friends. Site: http://www.friendster.com/

    Information Sources

  • Friendster - Scaling for 1 Billion Queries per day

    Platform

  • MySQL
  • Perl
  • PHP
  • Linux
  • Apache

    What's Inside?

  • Dual x86-64 AMD Opterons with 8 GB of RAM
  • Faster disk (SAN)
  • Optimized indexes
  • Traditional 3-tier architecture with hardware load balancer in front of the databases
  • Clusters based on types: ad, app, photo, monitoring, DNS, gallery search DB, profile DB, user infor DB, IM status cache, message DB, testimonial DB, friend DB, graph servers, gallery search, object cache.

    Lessons Learned

  • No persistent database connections.
  • Removed all sorts.
  • Optimized indexes
  • Don’t go after the biggest problems first
  • Optimize without downtime
  • Split load
  • Moved sorting query types into the application and added LIMITS.
  • Reduced ranges
  • Range on primary key
  • Benchmark -> Make Change -> Benchmark -> Make Change (Cycle of Improvement)
  • Stabilize: always have a plan to rollback
  • Work with a team
  • Assess: Define the issues
  • A key design goal for the new system was to move away from maintaining session state toward a stateless architecture that would clean up after each request
  • Rather than buy big, centralized boxes, [our philosophy] was about buying a lot of thin, cheap boxes. If one fails, you roll over to another box.

    Click to read more ...

  • Tuesday
    Jul102007

    mixi.jp  Architecture

    Mixi is a fast growing social networking site in Japan. They provide services like: diary, community, message, review, and photo album. Having a lot in common with LiveJournal they also developed many of the same approaches. Their write up on how they scaled their system is easily one of the best out there. Site: http://mixi.jp

    Information Sources

  • mixi.jp - scaling out with open source

    Platform

  • Linux
  • Apache
  • MySQL
  • Perl
  • Memcached
  • Squid
  • Shard

    What's Inside?

  • They grew to approximately 4 million users in two years and add over 15,000 new users/day.
  • Ranks 35th on Alexa and 3rd in Japan.
  • More than 100 MySQL servers
  • Add more than 10 servers/month
  • Use non-persistent connections.
  • Diary traffic is 85% read and 15% write.
  • Message traffic is is 75% read and 25% write.
  • Ran into replication performance problems so they had to split the database.
  • Considered splitting vertically by user or splitting horizontally by table type.
  • The ended up partitioning by table type and user. So all the messages for a group of users would be assigned to a particular database. Partitioning key is used to decide in which database data should be stored.
  • For caching they use memcached with 39 machines x 2 GB memory.
  • Stores more than 8 TB of images with about 23 GB added per day.
  • MySQL is only used to store metadata about the images, not the images themselves.
  • Images are either frequently accessed or rarely accessed.
  • Frequently accessed images are cached using Squid on multiple machines.
  • Rarely accessed images are served from the file system. There's no profit in caching them.

    Lessons Learned

  • When using dynamic partitioning it's difficult to pick keys and algorithms for where data should be stored.
  • Once you partition data you can no longer do joins and you have to open a lot of connections to different databases to merge the data back together.
  • It's hard to add new hosts and rearrange data when you partition. For example, let's say your partitioning algorithm stores all the messages for users 1-N on host 1. Now let's say host 1 becomes overburdened and you want to repartition users across more hosts. This is very difficult to do.
  • By using distributed memory caching they rarely hit the DB and there average page load time is about .02 seconds. This reduces the problems associated with partitioning.
  • You will often have to develop strategies based on the type of content. For example, image will be treated differently than short text posts.
  • Social networking sites are very time oriented, so it might be useful to partition data by time as well as user and type.

    Click to read more ...

  • Tuesday
    Jul102007

    Webcast: Advanced Database High Availability and Scalability Solutions

    If MySQL, PostgreSQL or EnterpriseDB High-Availability and Scalability issues are on your plate, you'll find this webcast very informative. Highly recommended! Webcast starts on Thursday, July 12, 2007 at 10:00AM PDT (1:00PM EDT, 18:00GMT). Duration: 50 minutes, plus Q&A Advanced Database High-Availability and Scalability Solutions ImageProgram Agenda Disk Based Replication • Overview, major features • Benefits, use cases • Limitations and challenges Master/Slave Asynchronous Replication • Overview, major features • Benefits, use cases • Limitations and challenges Synchronous Multi-Master Cluster: Continuent uni/cluster • Cluster overview, major features • Cluster benefits, use cases • Limitations and challenges Product Positioning: HA Continuum • Comparisons • Key differentiators • How to pick the right solution Continuent Professional Services • HA Quick Assessment Service • HA JumpStart Implementation Services Q&A Presented by: • Robert Hodges, CTO - Continuent • Robert Noyes, Director of Sales, Americas - Continuent Webcast starts on Thursday, July 12, 2007 at 10:00AM PDT (1:00PM EDT, 18:00GMT). Duration: 50 minutes, plus Q&A. Click Here to Register! Continuent, the High Availability and Scalability Experts! If you are concerned about any of the following… - Application Availability - Read Scalability - Write Scalability - ZERO data loss requirement - Disaster Recovery - Geographically Distributed Operations … you'll want to talk to us!

    Click to read more ...

    Monday
    Jul092007

    LiveJournal Architecture

    A fascinating and detailed story of how LiveJournal evolved their system to scale. LiveJournal was an early player in the free blog service race and faced issues from quickly adding a large number of users. Blog posts come fast and furious which causes a lot of writes and writes are particularly hard to scale. Understanding how LiveJournal faced their scaling problems will help any aspiring website builder. Site: http://www.livejournal.com/

    Information Sources

  • LiveJournal - Behind The Scenes Scaling Storytime
  • Google Video
  • Tokyo Video
  • 2005 version

    Platform

  • Linux
  • MySql
  • Perl
  • Memcached
  • MogileFS
  • Apache

    What's Inside?

  • Scaling from 1, 2, and 4 hosts to cluster of servers.
  • Avoid single points of failure.
  • Using MySQL replication only takes you so far.
  • Becoming IO bound kills scaling.
  • Spread out writes and reads for more parallelism.
  • You can't keep adding read slaves and scale.
  • Shard storage approach, using DRBD, for maximal throughput. Allocate shards based on roles.
  • Caching to improve performance with memcached. Two-level hashing to distributed RAM.
  • Perlbal for web load balancing.
  • MogileFS, a distributed file system, for parallelism.
  • TheSchwartz and Gearman for distributed job queuing to do more work in parallel.
  • Solving persistent connection problems.

    Lessons Learned

  • Don't be afraid to write your own software to solve your own problems. LiveJournal as provided incredible value to the community through their efforts.
  • Sites can evolve from small 1, 2 machine setups to larger systems as they learn about their users and what their system really needs to do.
  • Parallelization is key to scaling. Remove choke points by caching, load balancing, sharding, clustering file systems, and making use of more disk spindles.
  • Replication has a cost. You can't just keep adding more and more read slaves and expect to scale.
  • Low level issues like which OS event notification mechanism to use, file system and disk interactions, threading and even models, and connection types, matter at scale.
  • Large sites eventually turn to a distributed queuing and scheduling mechanism to distribute large work loads across a grid.

    Click to read more ...

  • Sunday
    Jul082007

    Welcome to High Scalability

    We started High Scalability to help you build successful scalable websites. This site tries to bring together all the lore, art, science, practice, and experience of building scalable websites into one place so you can learn how to build your system with confidence. Hopefully this site will move you further and faster along the learning curve of success. Please Start Here.

    Click to read more ...