Distributed systems are not typically a place domain driven design is applied. Distributed processing projects often start with an overall architecture vision and the idea about a processing model which basically drives the whole thing, including object design if it exists at all. Elaborate object designs are thought of as something that just gets in the way of distribution and performance, so the idea of spending time to apply DDD principles gets rejected in favour of raw throughput and processing power. However, from my experience, some more advanced DDD concepts can significantly improve the performance, scalability and throughput of distributed systems when applied correctly.
This article a summary of the presentation titled "DDD in a distributed world" from the DDD Exchange 09 in London.
Performance is critical to the success of any web site, and yet today's web applications push browsers to their limits with increasing amounts of rich content and heavy use of Ajax. In his new book Even Faster Web Sites: Performance Best Practices for Web Developers, Steve Souders, web performance evangelist at Google and former Chief Performance Yahoo!, provides valuable techniques to help you optimize your site's performance.
Souders' previous book, the bestselling High Performance Web Sites, shocked the web development world by revealing that 80% of the time it takes for a web page to load is on the client side. In Even Faster Web Sites, Souders and eight expert contributors provide best practices and pragmatic advice for improving your site's performance in three critical categories:
Speed is essential for today's rich media web sites and Web 2.0 applications. With this book, you'll learn how to shave precious seconds off your sites' load times and make them respond even faster.
Steve Souders works at Google on web performance and open source initiatives. His book High Performance Web Sites explains his best practices for performance along with the research and real-world results behind them. Steve is the creator of YSlow, the performance analysis extension to Firebug. He is also co-chair of Velocity 2008, the first web performance conference sponsored by O'Reilly. He frequently speaks at such conferences as OSCON, Rich Web Experience, Web 2.0 Expo, and The Ajax Experience.
Steve previously worked at Yahoo! as the Chief Performance Yahoo!, where he blogged about web performance on Yahoo! Developer Network. He was named a Yahoo! Superstar. Steve worked on many of the platforms and products within the company, including running the development team for My Yahoo!.
(Please bare with me, I'm a new, passionate, confident and terrified programmer :D )
Background:
I'm pre-launch and 1 year into the development of my application. My target is to be able to eventually handle millions of registered users with 5-10% of them concurrent. Up to this point I've used auto-increment to assign unique identifiers to rows. I am now considering switching to a non-sequential strategy. Oh, I'm using the LAMP configuration.
My reasons for avoiding auto-increment:
1. Complicates replication when scaling horizontally. Risk of collision is significant (when running multiple masters). Note: I've read the other entries in this forum that relate to ID generation and there have been some great suggestions -- including a strategy that uses auto-increment in a way that avoids this pitfall... That said, I'm still nervous about it.
2. Potential bottleneck when retrieving/assigning IDs -- IDs assigned at the database.
My reasons for being nervous about non-sequential IDs:
1. To guarantee uniqueness, the IDs are going to be much larger -- potentially affecting performance significantly
My New Strategy:
(I haven't started to implement this... I'm waiting for someone smarter than me to steer me in the right direction)
1. Generate a guaranteed-unique ID by concatenating the user id (1-9 digits) and the UNIX timestamp(10 digits).
2. Convert the resulting 11-19 digit number to base_36. The resulting string will be alphanumeric and 6-10 characters long. This is, of course, much shorter (at least with regard to characters) then the standard GUID hash.
3. Pass the new identifier to a column in the database that is type CHAR() set to binary.
My Questions:
1. Is this a valid strategy? Is my logic sound or flawed? Should I go back to being a graphic designer?
2. What is the potential hit to performance?
3. Is a 11-19 digit number (base 10) actually any larger (in terms of bytes) than its base-36 equivalent?
I appreciate your insights... and High Scalability for supplying this resource!
As traffic to cnbc.com continued to grow, we found ourselves in an all-too-familiar situation where one feels that a BIG change in how things are done was in order, the status-quo was a road to nowhere. The spending on HW, amount of space and power required to host additional servers, less-than-stellar response times, having to resort to frequent "micro"-caching and similar tricks to try to improve code performance - all of these were surfacing in plain sight, hard to ignore.
While code base could clearly be improved, the limited Dev resources and having to innovate to stay competitive always limits ability to go about refactoring. So how can one go about addressing performance and other needs without a full blown effort across the entire team ? For us, the answer was aiCache - a Web caching and application acceleration product (aicache.com).
This read will provide you with information about how Netflix deals with high load on their movie rental website.
It was written by Bill Scott in the fall of 2008.
Hibernate and iBATIS and other similar tools have documentation with recommendations for avoiding the "N+1 select" problem. The problem being that if you wanted to retrieve a set of widgets from a table, one query would be used to to retrieve all the ids of the matching widgets (select widget_id from widget where ...) and then for each id, another select is used to retrieve the details of that widget (select * from widget where widget_id = ?). If you have 100 widgets, it requires 101 queries to get the details of them all.
I can see why this is bad, but what if you're doing entity caching? i.e. If you run the first query to get your list of ids, and then for each widget you retrive it from the cache. Surely in that case, N+1(+caching) is good? Assuming of course that there is a high probability of all of the matching entities being in the cache.
I may be asking a daft question here - one whose answer is obviously implied by the large scalable mechanisms for storing data that are in use these days.
Learned lessons from the largest player (Flickr, YouTube, Google, etc)
I would like to write today about some learned lessons from the biggest player in the high Scalable Web application. I will divide the lessons into 4 points:
* Start slow, and small, and measuring the right thing.
* Vertical Scalability vs. Horizontal Scalability.
* Every problem has its own solution.
* General learned lesson
Want your apps to run faster? Here’s what not to do. By: Bart Smaalders, Sun Microsystems.
Performance Anti-Patterns:
- Fixing Performance at the End of the Project
- Measuring and Comparing the Wrong Things
- Algorithmic Antipathy
- Reusing Software
- Iterating Because That’s What Computers Do Well
- Premature Optimization
- Focusing on What You Can See Rather Than on the Problem
- Software Layering
- Excessive Numbers of Threads
- Asymmetric Hardware Utilization
- Not Optimizing for the Common Case
- Needless Swapping of Cache Lines Between CPUs
For more detail go there
A common problem of the application designers is to predict when they need to start worrying about the Architectural/System improvements on their application. Do I need to add more resources? If yes, then how long before I am compelled to do so? The question is not only when but also what. Should I plan to implement a true caching layer on top of my application or do I need to shard my database. Do I need to move to a distributed search infrastructure and if yes when ! Essentially we try to find out the functionalities of the application that will become critical over time.
Evan Weaver from Twitter presented a talk on Twitter software upgrades, titled Improving running components as part of the Systems that never stop track at QCon London 2009 conference last Friday. The talk focused on several upgrades performed since last May, while Twitter was experiencing serious performance problems.
Recent comments
14 hours 36 min ago
21 hours 3 min ago
21 hours 14 min ago
1 day 7 hours ago
1 day 7 hours ago
1 day 10 hours ago
2 days 19 hours ago
2 days 19 hours ago
2 days 21 hours ago
2 days 23 hours ago