GigaSpaces launched OpenSpaces.org, a community web site for developers who wish to utilize and contribute to the open source OpenSpaces development framework. OpenSpaces extends the Spring Framework for enterprise Java development, and leverages the GigaSpaces eXtreme Application Platform (XAP) for data caching, messaging and as the container for application business logic. It is designed for building highly-available, scale-out applications in distributed environments, such as SOA, cloud computing, grids and commodity servers. OpenSpaces is widely used in a variety of industries, including financial services, telecommunications, manufacturing and retail -- and across the web in e-commerce, Web 2.0 applications such as social networking sites, search and more. OpenSpaces.org already lists more than two dozen projects submitted by the developer community, including GigaSpaces customers, partners and employees. Innovative projects include an instant messaging platform, integration with PHP, configuration via JRuby, an implementation of Spring Batch and a scalable dynamic RSS feed delivery system. GigaSpaces recently announced the OpenSpaces Developer Challenge, a developer competition with $25,000 in total prizes and a $10,000 grand prize. The prizes will be awarded to the most innovative applications built using the OpenSpaces framework or plug-ins that extend it. The Challenge deadline is April 2, 2008 and ‘early bird’ prizes are available for those who submit their concepts by February 13, 2008. Additionally, in November of 2007 GigaSpaces launched its Start-Up Program, which provides free software licenses for qualifying individuals and companies.
The Google Operating System blog has an interesting post on Google's scale based on an updated version of Google's paper about MapReduce. The input data for some of the MapReduce jobs run in September 2007 was 403,152 TB (terabytes), the average number of machines allocated for a MapReduce job was 394, while the average completion time was 6 minutes and a half. The paper mentions that Google's indexing system processes more than 20 TB of raw data. Niall Kennedy calculates that the average MapReduce job runs across a $1 million hardware infrastructure, assuming that Google still uses the same cluster configurations from 2004: two 2 GHz Intel Xeon processors with Hyper-Threading enabled, 4 GB of memory, two 160 GB IDE hard drives and a gigabit Ethernet link. Greg Linden notices that Google's infrastructure is an important competitive advantage. "Anyone at Google can process terabytes of data. And they can get their results back in about 10 minutes, so they can iterate on it and try something else if they didn't get what they wanted the first time." It is interesting to compare this to Amazon EC2:
- $0.40 Large Instance price per hour x 400 instances x 10 minutes = $26.7
- 1 TB data transfer in at $0.10 per GB = $100
I fully and enthusiastically encourage anyone who wants to share a relevant topic to register and post. People have added a lot of good and useful content. Don't be shy. It's been asked how a teaser is created when posting so the full article doesn't display on the front page. A teaser is a paragraph interesting enough to convince readers to click on the "read more" link to get the full article. Creating a teaser in Drupal is accomplished by inserting < ! -- break -- > on a separate line directly after the text you want to be the teaser. Only DO NOT include the spaces. So your post looks like: Teaser Content < ! -- break -- > (no spaces in real life) Rest of Content It's a bit kludgey, but it works.
Gandi.net, a French domain registrar has launched a very flexible dynamic resource allocated VPS service.
Easy FTP redundancy and consolidation with the Open Source project Generic-FTP. Works with probably any Linux FTP Server (ProFTPD only one tested). Get rid of some single points of failure. A very easy to set up solution using scripts written in PHP. Tested thoroughly in a production environment.
In a recent project, I utilized RoR's cookie-based session storage to shard geographically distinct user groups. My technique for doing so was unique and, although it was a premature optimization, it is none-the-less an idea worth exploring.
I was wondering if it is already possible to scale a MONO's .NET website. I cannot see any real websites (with the term real I mean "a highly visited website") running mono. What do you think? Will MONO ASP.NET scale??? Is it worth planning a site to run with Mono asp.net? Or should we leave it to the future? What do you think?
I had a false belief I thought I came here to stay We're all just visiting All just breaking like waves The oceans made me, but who came up with me? Push me, pull me, push me, or pull me out . So true Perl Jam (Push me Pull me lyrics), so true. I too have wondered how web clients should be notified of model changes. Should servers push events to clients or should clients pull events from servers? A topic worthy of its own song if ever there was one. To pull events the client simply starts a timer and makes a request to the server. This is polling. You can either pull a complete set of fresh data or get a list of changes. The server "knows" if anything you are interested in has changed and makes those changes available to you. Knowing what has changed can be relatively simple with a publish-subscribe type backend or you can get very complex with fine grained bit maps of attributes and keeping per client state on what I client still needs to see. Polling is heavy man. Imagine all your clients hitting your servers every 5 seconds even if there are no updates. And if every poll request ends up in a flurry of database requests your database can be hammered. Of course, caching can smooth out this jagged trip, but if you keep per client state you need more clever per client cache views. The overhead of polling can be mitigated somewhat by piggy backing updates on replies to client requests. So if polling has a high overhead then it makes sense to only send data when there's an update the client should see. That is, we push data to the client. The current push model favorite is Comet: a World Wide Web application architecture in which a web server sends data to a client program (normally a web browser) asynchronously without any need for the client to explicitly request it. It allows creation of event-driven web applications, enabling real-time interaction otherwise impossible in a browser. Nothing comes for free however and pushing has a surprising amount of overhead too. A connection has to be kept open between the client and server for the new data to pushed over. Typically servers don't handle large tables of connections very well so this approach hasn't worked well. You had to spread the connections over multiple servers. Fortunately operating systems are getting better at handling large numbers of connections. For every connection you also have to store the data to push to the client and you need a thread to send it. It's easy to see how this could go bad with naive architectures. Architecturally I've always sided on polling for complete datasets rather than pushing or polling just for changes. This is the simplest and best self-healing architecture. Machines can go up and down at will and your client will always be correct and consistent. There's no chance for the stream of changes to get out of sync. Your client view will always be correct. The server side doesn't have to do anything too special. Clients already know how to do it. And you use client resources to do the polling and the update on the client side. All you have to do to scale polling is have enough machines, smart caching to handle the load, enough bandwidth to handle larger datasets, and a problem where low latency isn't required. That's all :-) The Comet Daily, not affiliated with Super Man I hear, is making a strong case for push in their articles Comet is Always Better Than Polling and 20,000 Reasons Why Comet Scales. Special application server software is needed because your typical app server can't handle lots of persistent connections. They tend to run out of threads and memory. Greg Wilkins talks about these and other issues in Blocking Servlets, Asynchronous Transport. This is all pretty standard stuff when you build your own messaging system, but I guess it has taken a while to move into the web infrastructure. With Comet they found: The key result is that sub-second latency is achievable even for 20,000 users. There is an expected latency vs. throughput tradeoff: for example, for 5,000 users, 100ms latency is achievable up to 2,000 messages per second, but increases to over 250ms for rates over 3,000 messages per second. Interesting results, especially if your application requires low latency updates. Most people haven't deployed or even considered push based architectures. With Comet it's at least something to think about. I can't resist adding this cute animation of a llama push me pull me.
All, What is the best way to scan the content being uploaded by the users? Is there any open source solution available to do that? How does YouTube, flickr and other user uploadable content sites handle this? Any insight would be greatly appreciated! Regards, Janakan Rajendran.
Shanti Braford details how his Ruby on Rails based website survived a 24 hour 550,000+ pageview digg attack. His post cleanly lays out all the juicy setup details, so there's not much I can add. Hosting costs $370 a month for 1 web server, 1 database server, and sufficient bandwidth. The site is built on RoR, nginx, MySQL, and 7 mongrel servers. He thinks Rails 2.0 has improved performance and credits database avoidance and fragment caching for much of the performance boost. Keep in mind his system is relatively static, but it's a very interesting and useful experience report.