Ivan Zuzak wrote a fascinating article on Real-time feed processing and filtering using Google App Engine to build Feed-buster, a service that inserts MediaRSS tags into feeds that don't have them. He talks about using polling and PubSubHubBub (real-time) to process FriendFeed feeds. Ivan is trying to devise a separate filtering service where:
- filtering services should be applied as close to the publisher as possible so notifications that nobody wants don’t waste network resource.
- processing services should be applied as close to the subscriber so that the original update may be transported through the network as a single notification for as long as possible.
Besides being a generally interesting article, Ivan makes an insightful observation on the nature of using polling services in combination with metered Infrastructure/Platform services:
Polling is bad because AppEngine applications have a fixed free daily quota for consumed resources, when the number of feeds the service processed increased - the daily quota was exhausted before the end of the day because FF polls the service for each feed every 45 minutes.
This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.
Have we reached the end of scaling? That's what I asked myself one day after noticing a bunch of "The End of" headlines. We've reached The End of History because the Western liberal democracy is the "end point of humanity's sociocultural evolution and the final form of human government." We've reached The End of Science because of the "fact that there aren't going to be any obvious, cataclysmic revolutions." We've even reached The End of Theory because all answers can be found in the continuous stream of data we're collecting. And doesn't always seem like we're at The End of the World?
Motivated by the prospect of everything ending, I began to wonder: have we really reached The End of Scaling?
This is a guest posting by Marty Abbott and Michael Fisher, authors of The Art of Scalability. I'm still reading their book and will have an interview with them a little later.
If 2010 is the year that you’ve decided to kickoff your startup or if you’ve already got something off the ground and are expecting double or triple digit growth, this list is for you. We all want the attention of users to achieve viral growth but as many can attest, too much attention can bring a startup to its knees. If you’ve used Twitter for any amount of time you’re sure to have seen the “Fail Whale”, which is so often seen that it has its own fan club. Take a look at the graph below from Compete.com showing Twitter’s unique visitors. One can argue that limitations in the product offering have as much to do with the flattening of growth over the past six months as does the availability, but it’s hard to believe the inability of users to actually use the service has not hindered growth.
What should you do if you want your startup to scale with double and triple digit growth? We’ve put together a list of 11 strategies that will aid in your quest for scalability. In our recently released book “The Art of Scalability” you will find more details about these and other strategies.
Terrastore is a new-born document store which provides advanced scalability and elasticity features without sacrificing consistency.
Here are a few highlights:
- Ubiquitous: based on the universally supported HTTP protocol.
- Distributed: nodes can run and live everywhere on your network.
- Elastic: you can add and remove nodes dynamically to/from your running cluster with no downtime and no changes at all to your configuration.
- Scalable at the data layer: documents are partitioned and distributed among your nodes, with automatic and transparent re-balancing when nodes join and leave.
- Scalable at the computational layer: query and update operations are distributed to the nodes which actually holds the queried/updated data, minimizing network traffic and spreading computational load.
- Consistent: providing per-document consistency, you're guaranteed to always get the latest value of a single document, with read committed isolation for concurrent modifications.
- Schemaless: providing a collection-based interface holding JSON documents with no pre-defined schema, you can just create your collections and put everything you want into.
- Easy operations: install a fully working cluster in just a few commands and no XML to edit.
- Features rich: support for push-down predicates, range queries and server-side update functions.
Ashleigh Anderson from Zynga let me know that they have an opening for a Systems Engineer working on some new games they are developing. Given the state of the job market I thought it worth posting. Here are more details...
Current disk based RDBMS can run out of steam when processing large data. Can these problems be solved by migrating from a disk based RDBMS to an IMDB? Any limitations? To find out, I tested one of each from the two leading vendors who together hold 70% of the market share - Oracle's 11g and TimesTen 11g, and IBM's DB2 v9.5 and solidDB 6.3.
read more at BigDataMatters.com
How many times have we all run across a situation where the performance tests on a piece of software pass with flying colors on the test systems only to see the software exhibit poor performance characteristics when the software is deployed in production? Read More Here...