The following technical Webinar could be of interest to the community.
WHO:
WHAT:
WHEN:
Check out the details here!
The recent Data-Intensive Computing Symposium brought together experts in system design, programming, parallel algorithms, data management, scientific applications, and information-based applications to better understand existing capabilities in the development and application of large-scale computing systems, and to explore future opportunities.
Google Fellow Jeff Dean had a very interesting presentation on Handling Large Datasets at Google: Current Systems and Future Directions. He discussed:
• Hardware infrastructure
• Distributed systems infrastructure:
–Scheduling system
–GFS
–BigTable
–MapReduce
• Challenges and Future Directions
–Infrastructure that spans all datacenters
–More automation
It is really like a "How does Google work" presentation in ~60 slides?
Scalr is a fully redundant, self-curing and self-scaling hosting environment utilizing Amazon's EC2. It has been recently open sourced on Google Code.
Scalr allows you to create server farms through a web-based interface using prebuilt AMI's for load balancers (pound or nginx), app servers (apache, others), databases (mysql master-slave, others), and a generic AMI to build on top of.
Scalr promises automatic high-availability and scaling for developers by health and load monitoring.
The health of the farm is continuously monitored and maintained. When the Load Average on a type of node goes above a configurable threshold a new node is inserted into the farm to spread the load and the cluster is reconfigured. When a node crashes a new machine of that type is inserted into the farm to replace it.
The Exceptional Performance group at Yahoo! has identified 14 best practices for making web pages faster. These best practices have proven to reduce response times of Yahoo! properties by 25-50%. They focus on the front-end, for example, why it's bad to use "@import" for including stylesheets and why ETags disable browser caching.
This google tech talk features these best practices and demonstrate YSlow.
Relevant links:
14 Rules for Exceptional Web Performance: http://developer.yahoo.com/performance/rules.html
YSlow: http://developer.yahoo.com/yslow/
Check out the book for details: High Performance Web Sites: Essential Knowledge for Front-End Engineers
The Google Operating System blog has an interesting post on Google's scale based on an updated version of Google's paper about MapReduce.
The input data for some of the MapReduce jobs run in September 2007 was 403,152 TB (terabytes), the average number of machines allocated for a MapReduce job was 394, while the average completion time was 6 minutes and a half. The paper mentions that Google's indexing system processes more than 20 TB of raw data.
Recent comments
2 days 5 hours ago
2 days 5 hours ago
2 days 5 hours ago
2 days 5 hours ago
1 week 2 days ago
1 week 3 days ago
1 week 4 days ago
1 week 4 days ago
1 week 4 days ago
1 week 4 days ago