Distributed systems are not typically a place domain driven design is applied. Distributed processing projects often start with an overall architecture vision and the idea about a processing model which basically drives the whole thing, including object design if it exists at all. Elaborate object designs are thought of as something that just gets in the way of distribution and performance, so the idea of spending time to apply DDD principles gets rejected in favour of raw throughput and processing power. However, from my experience, some more advanced DDD concepts can significantly improve the performance, scalability and throughput of distributed systems when applied correctly.
This article a summary of the presentation titled "DDD in a distributed world" from the DDD Exchange 09 in London.
A software-based distributed caching system such as memcached is an important piece of today's largest Internet sites that support millions of concurrent users and deliver user-friendly response times. The distributed nature of memcached design transforms 1000s of servers into one large caching pool with gigabytes of memory per node. This blog entry explores single-instance memcached scalability for a few usage patterns.
Table below shows out-of-the-box (no custom OS rewrites or networking tuning required) performance with 10G networking hardware and one single-socket UltraSPARC T2-based server with 8 cores and 8 threads per core (64 threads on a chip)...
Object Size / Ops/Sec / Bandwidth
100 bytes / 530,000 / 1.2 Gb/s
2048 bytes / 370,000 / 6.9 Gb/s
4096 bytes / 255,000 / 9.2 Gb/s
Check out the link for more details!
Today's Wolfram|Alpha is the first step in an ambitious, long-term project to make all systematic knowledge immediately computable by anyone. You enter your question or calculation, and Wolfram|Alpha uses its built-in algorithms and growing collection of data to compute the answer.
When Wolfram|Alpha launches later today, it will be one of the most computationally intensive websites on the internet. The Wolfram|Alpha computational knowledge engine is an "answer engine" that is able to produce answers to various questions such as
Wolfram|Alpha excels at different areas like mathematics, statistics, physics, engineering, astronomy, chemistry, life sciences, geology, business and finance as demonstrated by Steven Wolfram in his Introduction screencast.
There is no way to know exactly how much traffic to expect, especially during the initial period immediately following the launch, but the Wolfram|Alpha team is working hard to put reasonable capacity in place.
As Stephen writes in the Wolfram|Alpha blog Alpha will run in 5 distributed colocation facilities. What computing power have they gathered in these facilities for launch day? Two supercomputers, just about 10,000 processor cores, hundreds of terabytes of disks, a heck of a lot of bandwidth, and what seems like enough air conditioning for the Sahara to host a ski resort.
One of their launch partners, R Systems, created the world’s 44th largest supercomputer (per the June 2008 TOP500 list - it is listed as 66th per the latest Top500 list). They call it the R Smarr. It will be running Wolfram|Alpha on launch day! R Smarr has a Sum Rmax of 39580 GFlops using Dell DCS CS23-SH, QC HT 2.8 GHz computers, 4608 cores, 65536 GB of RAM and Infiniband interconnect.
Dell is another of the launch partners with a data center full of quad-board, dual-processor, quad-core Harpertown servers. What does it all add up to? The ability to handle 175 million queries (yielding maybe a billion) per day—over 5 billion queries (encompassing around 30 billion calculations) per month.
An interesting post on DataCenterKnowledge!
GemFire Enterprise is in-memory distributed data management platform that pools memory (and CPU, network and optionally local disk) across multiple processes to manage application objects and behavior. With the 6.0 release, GemFire has reached a stage of maturity in its evolution. GemStone touts this version as the true 'best of breed' distributed caching technology, solving scalability issues in all industries.
I found this resources:
High Scalable Architecture:
- YouTube Architecture
- Facebook Chat Architecture
- Amazon Architecture
Blogs:
- Scalability Guidelines for building scalable software system (part 1)
- Scalability Guidelines for building scalable software system (part 2)
- Scalability Guidelines for building scalable software system (part 3)
- Scalability Worst Practices
- how to minimize load time for fast user experiences
- Scalability principles
- Challanges for Developing Enterprise Application on the Cloud
- high-performance web page real-world examples netflix case study
- Intro to Caching,Caching algorithms and caching frameworks part 1
- Amdahl’s low
- How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
- Top 25 Most Dangerous Programming Mistakes
This post I provided a summary of recent discussions outlining the main challenges that developers face today when deploying their existing JEE application to the cloud such as complexity, database integration, security, standard JEE support etc. In this post i also provided summary of how we managed to handle those challenges with our new Cloud Computing Framework by pointing to an existing production reference of a leading Telco provider.
For those interested in building scalable systems, today I will speak about the Facebook Char architecture. Starting keynote:
Eugene Lutuchy, lead engineer on Facebook Chat
Hibernate and iBATIS and other similar tools have documentation with recommendations for avoiding the "N+1 select" problem. The problem being that if you wanted to retrieve a set of widgets from a table, one query would be used to to retrieve all the ids of the matching widgets (select widget_id from widget where ...) and then for each id, another select is used to retrieve the details of that widget (select * from widget where widget_id = ?). If you have 100 widgets, it requires 101 queries to get the details of them all.
I can see why this is bad, but what if you're doing entity caching? i.e. If you run the first query to get your list of ids, and then for each widget you retrive it from the cache. Surely in that case, N+1(+caching) is good? Assuming of course that there is a high probability of all of the matching entities being in the cache.
I may be asking a daft question here - one whose answer is obviously implied by the large scalable mechanisms for storing data that are in use these days.
Recent comments
14 hours 35 min ago
21 hours 2 min ago
21 hours 13 min ago
1 day 7 hours ago
1 day 7 hours ago
1 day 10 hours ago
2 days 19 hours ago
2 days 19 hours ago
2 days 21 hours ago
2 days 23 hours ago