I had a false belief I thought I came here to stay We're all just visiting All just breaking like waves The oceans made me, but who came up with me? Push me, pull me, push me, or pull me out . So true Perl Jam (Push me Pull me lyrics), so true. I too have wondered how web clients should be notified of model changes. Should servers push events to clients or should clients pull events from servers? A topic worthy of its own song if ever there was one. To pull events the client simply starts a timer and makes a request to the server. This is polling. You can either pull a complete set of fresh data or get a list of changes. The server "knows" if anything you are interested in has changed and makes those changes available to you. Knowing what has changed can be relatively simple with a publish-subscribe type backend or you can get very complex with fine grained bit maps of attributes and keeping per client state on what I client still needs to see. Polling is heavy man. Imagine all your clients hitting your servers every 5 seconds even if there are no updates. And if every poll request ends up in a flurry of database requests your database can be hammered. Of course, caching can smooth out this jagged trip, but if you keep per client state you need more clever per client cache views. The overhead of polling can be mitigated somewhat by piggy backing updates on replies to client requests. So if polling has a high overhead then it makes sense to only send data when there's an update the client should see. That is, we push data to the client. The current push model favorite is Comet: a World Wide Web application architecture in which a web server sends data to a client program (normally a web browser) asynchronously without any need for the client to explicitly request it. It allows creation of event-driven web applications, enabling real-time interaction otherwise impossible in a browser. Nothing comes for free however and pushing has a surprising amount of overhead too. A connection has to be kept open between the client and server for the new data to pushed over. Typically servers don't handle large tables of connections very well so this approach hasn't worked well. You had to spread the connections over multiple servers. Fortunately operating systems are getting better at handling large numbers of connections. For every connection you also have to store the data to push to the client and you need a thread to send it. It's easy to see how this could go bad with naive architectures. Architecturally I've always sided on polling for complete datasets rather than pushing or polling just for changes. This is the simplest and best self-healing architecture. Machines can go up and down at will and your client will always be correct and consistent. There's no chance for the stream of changes to get out of sync. Your client view will always be correct. The server side doesn't have to do anything too special. Clients already know how to do it. And you use client resources to do the polling and the update on the client side. All you have to do to scale polling is have enough machines, smart caching to handle the load, enough bandwidth to handle larger datasets, and a problem where low latency isn't required. That's all :-) The Comet Daily, not affiliated with Super Man I hear, is making a strong case for push in their articles Comet is Always Better Than Polling and 20,000 Reasons Why Comet Scales. Special application server software is needed because your typical app server can't handle lots of persistent connections. They tend to run out of threads and memory. Greg Wilkins talks about these and other issues in Blocking Servlets, Asynchronous Transport. This is all pretty standard stuff when you build your own messaging system, but I guess it has taken a while to move into the web infrastructure. With Comet they found: The key result is that sub-second latency is achievable even for 20,000 users. There is an expected latency vs. throughput tradeoff: for example, for 5,000 users, 100ms latency is achievable up to 2,000 messages per second, but increases to over 250ms for rates over 3,000 messages per second. Interesting results, especially if your application requires low latency updates. Most people haven't deployed or even considered push based architectures. With Comet it's at least something to think about. I can't resist adding this cute animation of a llama push me pull me.
All, What is the best way to scan the content being uploaded by the users? Is there any open source solution available to do that? How does YouTube, flickr and other user uploadable content sites handle this? Any insight would be greatly appreciated! Regards, Janakan Rajendran.
Shanti Braford details how his Ruby on Rails based website survived a 24 hour 550,000+ pageview digg attack. His post cleanly lays out all the juicy setup details, so there's not much I can add. Hosting costs $370 a month for 1 web server, 1 database server, and sufficient bandwidth. The site is built on RoR, nginx, MySQL, and 7 mongrel servers. He thinks Rails 2.0 has improved performance and credits database avoidance and fragment caching for much of the performance boost. Keep in mind his system is relatively static, but it's a very interesting and useful experience report.
I would like to know email architecture used by large ISPs.. or even used by google. Can someone point me to some sites?? Thanks..
Kevin Burton calculates that Blekko, one of the barbarian hoard storming Google's search fortress, would need to spend $5 million just to buy enough weapons, er storage. Kevin estimates storing a deep crawl of the internet would take about 5 petabytes. At a projected $1 million per petabyte that's a paltry $5 million. Less than expected. Imagine in days of old an ambitious noble itching to raise an army to conquer a land and become its new prince. For a fine land, and the search market is one of the richest, that would be a smart investment for a VC to make. In these situations I always ask: What would Machiavelli do? Machiavelli taught some lands are hard to conquer and easy to keep and some are easy to conquer and hard to keep. A land like France was easy to conquer because it was filled with nobles. You can turn nobles on each other because they always hate each other for some reason or another. But it's hard to keep a land of nobles because they all think they are as good as you are and will continually plot your downfall. The Ottoman empire was hard to conquer because it's led by a single ruler. Everyone owes their wealth and prosperity to that ruler so subjects, assuming the prince has not turned the people against him, will fight to death for the existing structure because their future depends on it. To conquer takes an all out war. But once victorious the Ottomon empire would be easy to rule because there are no loyalties to drive resistance. It was always a marriage of convenience. Google is the Ottomon empire. Allegiance is given to Google because people are getting paid. Defeating Google will take total war, assuming the prince has not turned the people against him, but once defeated ruling will be easy. How might Google keep strengthening the ties that bind to make it harder for a prospective prince? One way might be to prevent subjects from cavorting with potentially corrupting influences outside the land. What if Google were to give greater rewards to websites that changed their robots.txt to reject all other search engines? That would deny all routes into the principality and strengethen ties considerably. A new prince would find it very difficult to break in. Machiavelli might like that.
Hello, I am new to the back end side of things. Love this web site. Read all comments about Amazon hosting, actually I really like Amazon S3 but concerned that it may not be sufficient for my computing needs. And E3 just not too sure. What about hosting sites like host monster? Their prices seem amazing. Are they too good to be true? What are the cons and what are the things I should be considering? I am concerned about costs, but I want user experience to be world class. I am creating a media sharing site. Any help will be great. Thanks Fahad
Hi all, Has anyone got any experience with using Amazon S3 as an uploaded photo store? I'm writing a website that I need to keep as low budget as possible, and I'm investigating solutions for storing uploaded photos from users - not too many, probably in the low thousands. The site is commercial so I'm straying away from the Flickrs of the world. S3 seems to offer a solution but I'd like to hear from those who have used it before. Thanks Andy
All, I'm just new to this and have a basic understanding how CDN works? My questions are: 1. How does CDN sync data with web servers for video/images? If I have a user to upload a video to my site, will it get stored directly in CDN or it comes to my webserver first and then sync-ed with cache server? 2. How to have only the dynamic video/image delivered through CDN while the rest is served by a webserver? 3. How sync happens and who pays for the bandwidth for sync? I'd appreciate if someone could explain this. Regards, Janakan Rajendran
From http://directory.fsf.org/project/collectd/ : 'collectd' is a small daemon which collects system information every 10 seconds and writes the results in an RRD-file. The statistics gathered include: CPU and memory usage, system load, network latency (ping), network interface traffic, and system temperatures (using lm-sensors), and disk usage. 'collectd' is not a script; it is written in C for performance and portability. It stays in the memory so there is no need to start up a heavy interpreter every time new values should be logged. From the collectd website: Collectd gathers information about the system it is running on and stores this information. The information can then be used to do find current performance bottlenecks (i. e. performance analysis) and predict future system load (i. e. capacity planning). Or if you just want pretty graphs of your private server and are fed up with some homegrown solution you're at the right place, too ;). While collectd can do a lot for you and your administrative needs, there are limits to what it does: * It does not generate graphs. It can write to RRD-files, but it cannot generate graphs from these files. There's a tiny sample script included in contrib/, though. Also you can have a look at drraw for a generic solution to generate graphs from RRD-files. * It does not do monitoring. The data is collected and stored, but not interpreted and acted upon. There's a plugin for Nagios, so it can use the values collected by collectd, though. It's reportedly a reliable product that doesn't cause a lot load on your system. This enables you to collect data at a faster rate so you can detect problems earlier.
Compare: 1. MySQL Clustering(ndb-cluster stogare) 2. MySQL / GFS-GNBD/ HA 3. MySQL / DRBD /HA 4. MySQL Write Master / Multiple MySQL Read Slaves 5. Standalone MySQL Servers(Functionally seperated)