Update 3: InfoQ's Big Architecture Up Front - A Case of Premature Scalaculation? twines several different threads on the topic together into a fine noose. Update 2: Kevin says the biggest problems he sees with startups is they need to scale their backend (no, the other one). Update: My bad. It's hard to sell scalability so just forget it. The premise of Startups and The Problem Of Premature Scalaculation and Don’t scale: 99.999% uptime is for Wal-Mart is that you shouldn't spend precious limited resources worrying about scaling before you've first implemented the functionality that will make you successful enough to have scaling problems in the first place. It's kind of an embodied life force model of system creation. Energy is scarce so any parasites siphoning off energy must be hunted down and destroyed so the body has its best chance of survival. Is this really how it works? If I ever believed this I certainly don't believe it anymore. The world has changed, even since 2005. Thanks to many books and papers on how to scale the knowledge of scaling isn't the scarce precious resource it once was. It's no longer knowledge tightly held by a cabal of experts until Nicolas Cage flies in and pries it out of their grasping dessicated fingers. Now any journeyman computerista can do a reasonable job at designing a scalable system. Not only has knowledge dissemination improved, but so have our tools. Drastically. At one time building a scalable system up front would have required buying and configuring a truck load of servers, building out a data center, configuring a spider's web of networks, and bootstrapping an equally nasty storage network. All extremely complicated and disaster prone. Now you can use services like Amazon's EC2/S3, 3tera's grid OS, Joyent to cut significant parts of all that complexity out of the system. While most of us toil away in anonymity and scaling problems are just a fond dream, when the webosphere does find you it does so with a crush. With a little thinking ahead Blue Origin was able to handle 3.5 million requests and 758 GBs in bandwidth in a single day using S3. Did that effort prevent other features from getting implemented? I seriously doubt it. Usually doing the right thing isn't harder if you know what is the right thing to do. And what if Blue Origin wouldn't have been able to scale? Could they have recovered from the opportunity lost of grabbing the iron when it's hot and when potential customers are interested? Ask Friendster. What do you think? Has most of the risk associated with up front scalability design been squeezed out? Is premature scalation still something to be avoided? Or have times changed and does doing the simplest thing that could possibly work now include worrying about scaling up front?
People sometimes wonder why Oracle isn't mentioned on this site more. Maybe it will now as Michael Nygard reports Oracle 11g now does read/write splitting with their Active Data Guard product. Average replication latency was 1 second and it's accomplished with standard Oracle JDBC drivers. They see a 250% increase in transactions per service for read-write service. And a 110% improvement in tps for read-only service was found. You see a change in hardware architecture with the new setup. They now recommend using a primary and multiple standby servers, a single controller per server, and a single set of disks in RAID1. Previously the recommendation was to have a primary and secondary server with two controllers per server and a set of mirrored disks per controller. The changes increase performance, availability, and hardware utilization. They also have a useful looking best practices document for High Availability called Maximum Availability Architecture (MAA).
Update: Facebook pulls a Microsoft and embraces and extends by opening their platform to other social sites like Bebo. Very smart and unexpected. More info at Facebook to let other sites access platform code. This month's regular Facebook Meetup was held at Google and the topic of the day was OpenSocial. For those of you with real lives, OpenSocial "provides a common set of APIs for social applications across multiple websites." Over 200 excited people, hoping to do very exciting things, and dreaming of making an exciting pile of money, watched an OpenSocial presentation put on by a couple of appropriately knowledgeable evangelists. I could feel my social graph being more successfully monetized with each passing minute. Normally the meetings are much smaller, but Google puts on a very nice spread, so I think people may have showed up to dine :-) Or they could have showed up to learn why and how they should code to the new uber social API. By the looks of the full plates and the sounds of energetic chatter, it was likely a bit of both. The crowd seemed skeptical, yet interested. The Facebook world is somewhat self satisfied and that comfy world was being disturbed. It might get ugly I thought, but unfortunately it stayed quite civil and informative. With my bread I had hoped for a bit of circus. My take on OpenSocial: code social application once, run anywhere. Code your social app using Google's gadget model and the social API and it will run on any conforming social network container. It's kind of like a concurrency model based on mobile threads instead of the more traditional message passing model. So your friend's profile app will work just as will on Ning as Orkut. Interestingly, there's a layer of indirection the social network container has to locally interpret what things like friends are. So your friends in SalesForce could mean people you've email once and friends in Ning could mean people you've marked as friends. There's a fairly minimal API of verbs and nouns at this point, but that will undoubtedly grow. They are taking a "do the simplest thing" approach. Or they could have simply needed to get something out to compete with Facebook. Important features like a security model, authentication model, sharing model, and advanced data types are TBD. Lots of tricky things still have to be specified. How do you establish identity across services, who can share what information, how do apps deal with different terms of services, and how they deal with different social network models? OpenSocial is a group of companies so you hear a lot of things like "we'll have to meet and decide that. Joe has a lot of good ideas on how that might be done." The same sort of stuff you hear with all the complex Java standards that everyone hates. Maybe some group will Spring into action and fix some of the problems that develop. What I don't quite understand is how social networks will distinguish themselves from each other with a common API? Using the standard your app will run anywhere so why should I choose a particular social graph provider? So services will have to differentiate by adding nonstandard features which leads to a horrible complex mess of a system. They were already talking about using reflection so you could discover what capabilities a container had. Oh boy. Sounds like a hard road for developers. From a scalability POV you must still host your own applications. So that's no different from Facebook. If you get a million users overnight you have to figure out how to make them scale. On the bright side there was a properties like data store you could use to store data in. The amount of data, types, query model, transaction model, locking model, SLAs, etc seemed open, but not managing state is a big win. From a scalable development POV, I can't help but think the drive towards differentiation will require special coding for each target container and you'll have to pick just a few containers to develop for (think browser wars times 100), but we'll see.
This question is for all the gurus here. Please help this novice x I am starting a video sharing site like YouTube in India. I want to offer the best quality possible, at minimum cost. Nothing new about it, right? :). I have done some research on the dedicated hosting services and CDN services available and I have some basic knowledge on these. Following are my requirements 1) My budget is $500 to $1000 per month for hosting (including CDN if and as applicable). 2) I will need around 500GB of storage and 1TB per month of bandwidth in first 2-3 months and then about 10TB of storage and 5TB per month of bandwidth. And more ... depending on how big it gets (I can afford more when it gets big) 3) 90% of my viewers are in India. Other 10% are in US and UK. Based on the above, could you please answer my following questions? 1) Can I go with just a good dedicated server to start with and get a CDN service later on when the site gets big? Or do you think its wise to start with a CDN service? 2) Should I look for a server closer to India? They are pretty expensive in Asia? Should I look for one in Western Europe or at least Western US? How big a difference does it make? 3) Could you suggest the best dedicated hosting and CDN service based on my requirements? 4) I can get unmetered bandwidth on a 100Mbps pipe for my budget. Do you think that will be fine to start with? 5) Anything else I am missing? Also, could you also please give any tips on how to minimize the bandwidth (buffering, lower bitrate etc..)? Thanks a lot for your suggestions!
Today GigaSpaceslaunched the OpenSpaces Developer Challenge, which will award $25,000 in prizes to developers who build the most unique and innovative applications or plug-ins for the OpenSpaces Framework. OpenSpaces is an open source development solution from GigaSpaces for building linearly scalable software applications. It is widely used in a variety of industries, such as Wall Street trading applications, telecommunications platforms and online gaming.
The Challenge is designed to encourage innovation around OpenSpaces and support the developer community. Prizes ranging from $1,000 to $10,000 will be awarded to those who submit the most promising applications that were built using OpenSpaces, or plug-ins, and other components that extend OpenSpaces in pioneering ways.
The OpenSpaces development framework is designed to simply and dynamically scale out a software application across many computers -- also referred to as "cloud computing." It is unique in that it addresses applications that have been traditionally difficult to distribute in this manner, including high-throughput applications that are stateful, transactional or data-intensive. OpenSpaces leverages GigaSpaces' eXtreme Application Platform (XAP) as the middleware implementation, and is based on the popular Spring Framework developed by SpringSource .
Submissions for the OpenSpaces Developer Challenge will be accepted between December 10, 2007 and April 2, 2008. All applications will be reviewed and judged by a panel of industry experts, and the winners will be announced on the OpenSpaces.org Web site on April 22, 2008. The awards -- including the $10,000 first prize -- will be presented to the winners at a gala in San Francisco during the JavaOne 2008 conference in May. Winners will also be recognized in a worldwide press announcement.
The OpenSpaces Developer Challenge marks the latest initiative by GigaSpaces to encourage development and innovation in the developer community. The Company provides developers with easy access to GigaSpaces' products and solutions through its Start-Up Program, which provides qualified companies and individuals with full, free and perpetual use of the Company's flagship product, GigaSpaces XAP. In addition, the Company provides a free Community Edition of its product and contributes to several open source initiatives, including the Spring Framework and the Mule ESB.
To encourage "early bird" applications for the OpenSpaces Developer Challenge, ten $1,000 prizes will be drawn among those applicants who submitted an application concept (not actual code, just the concept of the proposed submission) by January 29, 2008.
Interested developers should:
Go to the OpenSpaces Developer Challenge Web site
Read the Challenge guidelines and FAQs
Develop a killer application or plug-in based on OpenSpaces
OPTIONAL: Submit their application concept by January 29 to be eligible for the ten $1,000 "early bird" prizes
Submit their actual application (including code) by April 2, 2008
Additional information on the Challenge is available here .
Hello all, Reading the site you can note that "1 Master for writes, N Slaves for reads" scheme is used offen. How is this implemented? Who decides where writes and reads go? Something in application level or specific database proxies, like Slony-I? Thanks.
Our company offers a web service that is provided to users from several different hosting centers across the globe. The content and functionality at each of the servers is almost exactly the same, and we could have based them all in a single location. However, we chose to distribute the servers geographically to offer our users the best performance, regardless where they might be. Up until now, the only content on the servers that has had to be synchronized is the server software itself. The features and functionality of our service are being updated regularly, so every week or two we push updates out to all the servers at basically the same time. We use a relatively manual approach to do the updating, but it works fine. Sometime soon, however, our synchronization needs are going to get a bit more complex. In particular, we'll soon start offering a feature at our site that will involve a database with content that will change on an almost second-by-second basis, based on user input and activity. For performance reasons, a complete instance of this database will have to be present locally at each of our server locations. At the same time, the content of the database will have to be synchronized across all server locations, so that users get the same database content, regardless of the server they choose to visit. We have not yet chosen the database that we'll use for this functionality, although we are leaning towards MySql. (We are also considering PostgreSQL.) So, my question for the assembled experts is: What approach is the best one for us to use to synchronize the database instances across our servers? Ideally, we'd like a solution that is resilient to a server location becoming unavailable, and we'd also prefer a solution that makes efficient use of bandwidth. (Processing power doesn't cost us a lot; bandwidth, on the other hand, can get expensive.) FWIW ... (1) Our servers run Apache and Tomcat on top of Centos. (2) I've found the following "how to" that suggests an approach involving MySQL that could address our needs: http://capttofu.livejournal.com/1752.html Thanks!