Photograpic picture to me is window, an address to that specific moment what do your think about that?
This is the intriguing quote by Bill Venners in an interview with Twitter's Alex Payne on Twitter's heretical switch from a pure Ruby stack to a Ruby on Rails stack on the front-end and JVM/Scala on the back-end:
So performance was also one of the problems with JRuby, which I [Bill Venners] think helps explain better why they'd [Twitter] prefer Scala over Ruby or JRuby for some things. I have often heard Rubyists say that although Ruby is slower than Java, for many things it is plenty fast enough, and they are right. The logic goes further, saying that servers are cheap, and programmers expensive, so it makes sense to tradeoff some runtime performance for programmer productivity. And I think that's very often true too, but not always. If you have enough traffic, at some point the cost of servers outweighs the cost of programmers. I'm not sure whether Twitter is past that point, but they get a lot of traffic. And frankly this isn't an intrinsic tradeoff. Other dynamic languages are faster than Ruby, and Scala is too. And people can be quite productive in these other languages too, including Scala.I feel Alex's Max Payne. You might wonder why the geekosphere cares so passionately which technology stack Twitter uses? Well, it's Twitter and it's Ruby on Rails. That's like the Lindsay Lohan and Samantha Ronson of tech buzz. It creates it's own self-sustaining posting reaction. Boom! It took some giant cajones to switch from a well defended platform like Ruby on Rails to an obscure language like Scala. Few people would have been brave enough to pull the trigger on that decision. Twitter didn't take this large leap out of ignorance or incompetence. Twitter's Steve Jenson said they spent several weeks going over our options, running extensive load tests, and presented our findings to the team at each stage. We did our due diligence. They did the work and came to conclusions valid for their situation. They have to follow their own bliss. They aren't telling you to use Scala. They aren't telling you not to use Ruby. Have at it. But they have chosen the path less traveled and seem happy with the direction they are heading. If you aren't happy with their decision then that's a you problem, not a them problem. This points out for me the evolving nature of the two tier web: a client tier and a back-end tier accessed only via an API. That back-end tier can be implemented in anything at all. Twitter likes the JVM for it's undeniable performance, reliability, and scalability. They may like Scala because it has a lot of cool features, is concise, has static typing, performs well, and is a pleasure to develop in. When people jumped with a happy little grin on their face to Ruby, they loved a lot of things about Ruby that helped them make that decision to switch. Maybe Scala is cool, effective, and fun to use too? So instead of worrying how the homeboys have ever so indirectly dissed Ruby, maybe it's worth taking a look at Scala to see if it's actually any good? As a start try Bill Venners on the rise of Scala. It's a great overview of Scala and why it's a worthy next evolution. Then maybe take a peek at First Steps to Scala. And if you want to take a look at some real-life code try Kestrel - tiny queue system based on starling. Knowing only the vaguest bits about Scala before my own little exploration, I come out impressed and interested. I like a lot of things Bill had to say: Scala means scalable language as in it scales to different sized tasks and domains; Scala's static typing is valuable, especially in a team project; Scala has the feel of a scripting language, but can also work as a systems language; Scala is an artful blend of a fully Object and fully Functional language; Scala supports imperative programming, but with a functional bent; Scala can express more in types than Java so you get more rigorous type checking; Scala is concise so can say more with less; Scala treats libraries like creating internal DSLs; Scala helps programmer productivity like Ruby and Python. Sounds like Scala is worth a deeper look to me. Can't we all just get along? Well, that's not how this post started at least. It started with Bill's refreshingly anti-establishment statement: If you have enough traffic, at some point the cost of servers outweighs the cost of programmers. This is an idea we don't hear much of anymore in this era of abundance. Servers cost real cash though, especially for bootstrappers and the truly humongous. So if you can cut your cash requirements by putting in a little programmer elbow grease, that makes a lot of sense to me. Performance still matters.
Want your apps to run faster? Here’s what not to do. By: Bart Smaalders, Sun Microsystems. Performance Anti-Patterns: - Fixing Performance at the End of the Project - Measuring and Comparing the Wrong Things - Algorithmic Antipathy - Reusing Software - Iterating Because That’s What Computers Do Well - Premature Optimization - Focusing on What You Can See Rather Than on the Problem - Software Layering - Excessive Numbers of Threads - Asymmetric Hardware Utilization - Not Optimizing for the Common Case - Needless Swapping of Cache Lines Between CPUs For more detail go there
Update 4:: Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3:: Scaling Digg and Other Web Applications. Update 2:: How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades. Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of requests a month? Site: http://digg.com
Related Articles* LinkedIn Architecture * Live Journal Architecture * Flickr Architecture * An Unorthodox Approach to Database Design : The Coming of the Shard
It's been awhile since I've said anything about collectl and I wanted to let this group know I'm currently working on an interface to ganglia since I've seen a variety of posts ranging from how much data to log and where to log it as well as which tools/mechanism to use for logging. From my perspective there are essentially 2 camps on the monitoring front - one says to have distributed agents all sending their data to a central point, but don't send too much or too often. The other camp (which is the one I'm in) says do it all locally with a highly efficient data collector, because you need a lot of data (I also read a post in here about logging everything) and you can't possibly monitors 100s or 1Ks of nodes remotely at the granularity necessary to get anything meaningful. Enter collectl and its evolving interface for ganglia. This will allow you to log lots of detailed data on local nodes at the usual 10 sec interval (or more frequent if you prefer) at about 0.1% system overhead while sending a subset at a lower rate to the ganglia gmonds. This would give you the best of both worlds but I don't know if people are too married to the centralized concept to try something different. I don't know how many people who follow this forum have actually tried it, I know at least a few of you have, but to learn more just go to http://collectl.sourceforge.net/ and look at some of documentation or just download the rpm and type 'collectl'. -mark
Art of scalability - series in my blog about Scalability principles, and guild lines. To read the whole series: Art of scalability (1) - Scalability principles Art of scalability (2) - Scalability guidelines part 1 Art of scalability (3) - Scalability guidelines part 2 Art of scalability (4) - Scalability guidelines part 3
Ladar Levison of Lavabit has written an incredible article on how they took a centralized off-the-shelf email server that could handle only few thousand users and built their own custom distributed infrastructure for handling hundreds of thousands of email users. Lavabit processes 70 gigabytes of data per day, is made up of 26 servers, hosts 260,000 email addresses, and processes 600,000 emails a day. That's a lot of email.
Lavabit's mission has a little edge to it too:
Lavabit was founded as a direct reaction to the larger free e-mail services available. We felt it was possible to create an e-mail service that was fast, reliable, feature rich and didn't achieve profitability by prostituting its user base to marketers.
What I really like about this article is that Lavabit has some challenging elements in dealing with different email protocols while being able to scale to a lot of users. There's more going on than just trying to scale out a database. Many products contain complicated bits like this, so it's interesting to see how Ladar handled them. There are lots of useful details that will help anyone build their own system. Putting in this extra work in is what Ladar thinks makes Lavabit different:
One of the ways to gain an advantage over your competition is to invest the time and money needed to build systems that are better than what is easily available to your competition. It is the custom platform we developed that has allowed us to thrive while many other free email companies either stopped offering their service for free, or shut down altogether.
Since Ladar was so thorough I saved article as a separate html file. Please select the visit link to read the entire article. I'd like to thank Ladar again for taking the time and making the effort to document their architecture for the benefit of the community at large to learn from.
A common problem of the application designers is to predict when they need to start worrying about the Architectural/System improvements on their application. Do I need to add more resources? If yes, then how long before I am compelled to do so? The question is not only when but also what. Should I plan to implement a true caching layer on top of my application or do I need to shard my database. Do I need to move to a distributed search infrastructure and if yes when ! Essentially we try to find out the functionalities of the application that will become critical over time.
Advertising Opportunities on High Scalability