advertise
Tuesday
Apr142009

Challanges for Developing Enterprise Application on the Cloud

This post I provided a summary of recent discussions outlining the main challenges that developers face today when deploying their existing JEE application to the cloud such as complexity, database integration, security, standard JEE support etc. In this post i also provided summary of how we managed to handle those challenges with our new Cloud Computing Framework by pointing to an existing production reference of a leading Telco provider.

Click to read more ...

Monday
Apr132009

Benchmark for keeping data in browser in AJAX projects

Hi, We are using AJAX and see a lot of opportunity to keep session state on client browser with javascript objects. Is there any benchmark about how much data you can generally keep in javascript objects in browser? Thanks, Unmesh

Click to read more ...

Monday
Apr132009

High Performance Web Pages – Real World Examples: Netflix Case Study

This read will provide you with information about how Netflix deals with high load on their movie rental website.
It was written by Bill Scott in the fall of 2008.

Read or download the PDF file here

Click to read more ...

Friday
Apr102009

Facebook Chat Architecture

For those interested in building scalable systems, today I will speak about the Facebook Char architecture. Starting keynote:

''When your feature’s userbase will go from 0 to 70 million practically overnight, scalability has to be baked in from the start.''

Eugene Lutuchy, lead engineer on Facebook Chat

Click to read more ...

Friday
Apr102009

Facebook's Aditya giving presentation on Facebook Architecture

Facebook's engg. director aditya talks about facebook architecture. How they use mysql, php and memcache. How they have modified the above to suit their requirements.

Click to read more ...

Friday
Apr102009

counting # of views, calculating most/least viewed

I'm seeking for a design pattern or advice or directions. I need to count views/downloads of a set of resources, let them to be identified by their respective URLs. This is not a big problem. I also need to keep a list of viewed/downloaded resources in the last X days. This list needs to be updated every now and then to reflect real last X days of usage. So resources that were requested prior to X days get evicted from it. So it's sort of a black box, you feed messages (download request) in and it gives you that list of URLs with counters on the other end. How would you go about designing it?

Click to read more ...

Wednesday
Apr082009

N+1+caching is ok?

Hibernate and iBATIS and other similar tools have documentation with recommendations for avoiding the "N+1 select" problem. The problem being that if you wanted to retrieve a set of widgets from a table, one query would be used to to retrieve all the ids of the matching widgets (select widget_id from widget where ...) and then for each id, another select is used to retrieve the details of that widget (select * from widget where widget_id = ?). If you have 100 widgets, it requires 101 queries to get the details of them all. I can see why this is bad, but what if you're doing entity caching? i.e. If you run the first query to get your list of ids, and then for each widget you retrive it from the cache. Surely in that case, N+1(+caching) is good? Assuming of course that there is a high probability of all of the matching entities being in the cache. I may be asking a daft question here - one whose answer is obviously implied by the large scalable mechanisms for storing data that are in use these days.

Click to read more ...

Wednesday
Apr082009

Learned lessons from the largest player (Flickr, YouTube, Google, etc)

Learned lessons from the largest player (Flickr, YouTube, Google, etc) I would like to write today about some learned lessons from the biggest player in the high Scalable Web application. I will divide the lessons into 4 points: * Start slow, and small, and measuring the right thing. * Vertical Scalability vs. Horizontal Scalability. * Every problem has its own solution. * General learned lesson Read more

Click to read more ...

Tuesday
Apr072009

Six Lessons Learned Deploying a Large-scale Infrastructure in Amazon EC2 

Lessons learned from OpenX's large-scale deployment to Amazon EC2:

  • Expect failures; what's more, embrace them
  • Fully automate your infrastructure deployments
  • Design your infrastructure so that it scales horizontally
  • Establish clear measurable goals
  • Be prepared to quickly identify and eliminate bottlenecks
  • Play wack-a-mole for a while, until things get stable

    Click to read more ...

  • Monday
    Apr062009

    How do you monitor the performance of your cluster?

    I had posted a note the other day about collectl and its ganglia interface but perhaps I wasn't provocative enough to get any responses so let me ask it a different way, specifically how do people monitor their clusters and more importantly how often? Do you monitor to get a general sense of what the system is doing OR do you monitor with the expectation that when something goes wrong you'll have enough data to diagnose the problem? Or both? I suspect both... Many cluster-based monitoring tools tend to have a data collection daemon running on each target node which periodically sends data to some central management station. That machine typically writes the data to some database from which it can then extract historical plots. Some even put up graphics in real-time. From my experience working with large clusters - and I'm talking either many hundreds or even 1000s of nodes, most have to limit both the amount of data they manage centrally as well as the frequency that they collect it, otherwise they'll overwhelm their management station because most DBs can't write hundreds of counters many times/minute from thousands of nodes. As a related example, how many of you run sar at the default monitoring interval of 10 minutes? Do you really think you're getting useful information? What happens if you have a 2 minutes burst of 100% network load and you're idle the other 8 minutes? Sar will happily tell you the network load was 20% and you'll never know your network is tanking. The point of all this is I do think there's a place for central monitoring, though I'm personally not a fan because of the inaccuracy of infrequent data samples, but I also appreciate some data is better than none, as long as you realize the inherent accuracy problems. And that's where collectl comes in and my previous comment about ganglia. When I wrote collectl my overarching design goal, from which I haven't wavered, was to provide highly accurate local data with minimal overhead so you will take samples in the 1-10 second range without fear of impacting the rest of the system. You can literally sample just about everything going on every 10 seconds and use <0.1% of the CPU. If you're willing to give up a few more tenths of a percent you can even monitor processes and slab activity, though you should only sample them at a 60 second frequency because it IS expensive to monitor them. However I also realize this doesn't do any good if do have 1Ks of machine you want to watch and so that's where the socket interface comes in over which collectl can send data to a central manager at that same frequency OR if you prefer have is send its remote data at a different rate, giving you the best of both worlds. Collectl can provide it's data to a central management station while at the same time providing local logging for accuracy, which will let you do a deep dive into the data if a problem does arrive for which there is not enough data stored centrally. My point about the ganglia interface was my response to the fact that a lot of of people running large (as well as smaller) clusters do use ganglia but like most central monitoring stations have to give up the accuracy of finer-grained data and I was just wondering if anyone looking at this forum use ganglia and if they might be interested in trying out the collectl interface to it. -mark

    Click to read more ...