Hi... I have this idea to start a really great and scalable website, and I am building it! So far I'm doing everything myself - coding, networking, architecture planning, everything. I haven't even gotten into the legal aspects yet....... It would be MUCH easier if I had a technical person to handle that end of the operation. I'm a good coder, but like Bill Gates at Harvard for Math, I'm not the very best. I'd like to FIND that very best person available, to handle the technical aspects. For worse or better, I don't presently know somebody who fits this bill. I've posted a bazillion ads on Craig's List, with no really qualified responses. I've put out feelers among my own network, same result. Not sure what else I can do. Shoestring budget, so it's sweat equity in the beginning. That can actually be a plus, as it forces people to focus. Any ideas about what else I can do, to attract the right person? Thanks Jason
A very entertaining and somewhat educational article on IBM Poopheads say LAMP Users Need to "grow up". The physical three tier architecture turns out to be the root of all evil and shared nothing architectures brings simplicity and light. In the comments Simon Willison makes an insightful comment on why fine grained caching works for personalized pages and proxy's don't: Great post, but I have to disagree with you on the finely grained caching part. If you look at big LAMP deployments such as Flickr, LiveJournal and Facebook the common technology component that enables them to scale is memcached - a tool for finely grained caching. That's not to say that they aren't doing shared-nothing, it's just that memcached is critical for helping the database layer scale. LiveJournal serves around 50% of its page views "permission controlled" (friends only) so an HTTP proxy on the front end isn't the right solution - but memcached reduces their database hits by 90%.
I have an application with couple of web servers that uses MemcacheD. How can i synchronize concurrent put to the cache? The value of the entry is list. Atomic append operation could have been helpful, but unfortunately memcahe doesn't support atomic append.
Seems as though anonymous users can edit old posts w/o any authentication. This post was loaded with spam/porn links. Now it is not. /anonymous
Release It! author Michael Nygard tells a tale of two web sites, both brought low by unexpectedly huge unbounded results sets that slowed down their sites to the speed of a Christmas checkout line. I've committed this error more than a few times. During testing the results sets are often small, so you don't see problems. Or when a product is new you don't have a lot of data so everything is fine, until some magic line is crossed and you get that dreaded 2AM fix it call. My most embarrassing bug of this type caused a rather spectacular failure at a customer site as the variance in response times was out of spec and this kicked in penalty clauses. What happened was the customer had a larger network than we could even test (customers always get the good stuff). I took a lock and went to get all the data. Because the result set was so much larger in their larger system I took the lock for many more milliseconds than I should have. Unknown to me a chunk of code on the critical path also was in the lock path and all hell broke loose. I had to change the logic to process the result set in fixed size deterministic chunks, releasing locks as I went. I even had to measure CPU usage and back off after a certain amount of CPU was used. But all was well again. I then hunted down every other place I made the same mistake. And there were a few. To solve this problem in general I developed an architecture supporting scheduling work by CPU usage. A common theme in many of the profiles on this site is protecting your system from requests that can bring down the system. Mailinator has a lot resource exhaustion problems and does a good job solving them. Ebay has an interesting strategy of doing as little work as possible in the database which leads them to do joins in application space. Which is exactly the opposite of this strategy's conclusion. But I think this may be going too far. With proper indexes performing selects in the database to minimize the result sets would seem to be a win as databases are good at this sort of thing. Yah, relational databases suck at doing top 10 type of logic, so calculate that on the fly and cache it. How can you bound results sets?
This is a question asked on the ycombinator list and there are some good responses. I gave a quick response, but I particularly like neilk's knock out of the park insightful answer:
Not surprisingly opinions on SimpleDB vary from it sucks, don't take my database, to it will change the world, who needs a database anyway? From a quick survey of the blogosphere, here's where SimpleDB stands at the moment:
Amazon has announced the limited beta of Amazon SimpleDB - a simple web services interface to create and store multiple data sets, query your data easily, and return the results. Together with the Simple Storage Service (S3), Elastic Compute Cloud (EC2) and other web services Amazon offers a complete utility computing platform. SimpleDB was the missing piece of AWS - the scalable structured database. Check out my blog entry: http://innowave.blogspot.com/2007/12/amazon-simpledb-scalable-cloud-database.html I was waiting for this one :-) Geekr
On the blogs.technet.com article on microsoft.com's infrastructure: The article reads like a blatant ad for it's own products, and is light on the technical side. The juicy bits are here, so you know what the fuss is about:
- Cytrix Netscaler (= loadbalancer with various optimizations)
- W2K8 + IIS7 and antivirus software on the webservers
- 650GB/day ISS log files
- 8-9GBit/s (unknown if CDN's are included)
- Simple network filtering: stateless access lists blocking unwanted ports on the routers/switches (hence the debated "no firewalls" claim).
Update 3: InfoQ's Big Architecture Up Front - A Case of Premature Scalaculation? twines several different threads on the topic together into a fine noose. Update 2: Kevin says the biggest problems he sees with startups is they need to scale their backend (no, the other one). Update: My bad. It's hard to sell scalability so just forget it. The premise of Startups and The Problem Of Premature Scalaculation and Don’t scale: 99.999% uptime is for Wal-Mart is that you shouldn't spend precious limited resources worrying about scaling before you've first implemented the functionality that will make you successful enough to have scaling problems in the first place. It's kind of an embodied life force model of system creation. Energy is scarce so any parasites siphoning off energy must be hunted down and destroyed so the body has its best chance of survival. Is this really how it works? If I ever believed this I certainly don't believe it anymore. The world has changed, even since 2005. Thanks to many books and papers on how to scale the knowledge of scaling isn't the scarce precious resource it once was. It's no longer knowledge tightly held by a cabal of experts until Nicolas Cage flies in and pries it out of their grasping dessicated fingers. Now any journeyman computerista can do a reasonable job at designing a scalable system. Not only has knowledge dissemination improved, but so have our tools. Drastically. At one time building a scalable system up front would have required buying and configuring a truck load of servers, building out a data center, configuring a spider's web of networks, and bootstrapping an equally nasty storage network. All extremely complicated and disaster prone. Now you can use services like Amazon's EC2/S3, 3tera's grid OS, Joyent to cut significant parts of all that complexity out of the system. While most of us toil away in anonymity and scaling problems are just a fond dream, when the webosphere does find you it does so with a crush. With a little thinking ahead Blue Origin was able to handle 3.5 million requests and 758 GBs in bandwidth in a single day using S3. Did that effort prevent other features from getting implemented? I seriously doubt it. Usually doing the right thing isn't harder if you know what is the right thing to do. And what if Blue Origin wouldn't have been able to scale? Could they have recovered from the opportunity lost of grabbing the iron when it's hot and when potential customers are interested? Ask Friendster. What do you think? Has most of the risk associated with up front scalability design been squeezed out? Is premature scalation still something to be avoided? Or have times changed and does doing the simplest thing that could possibly work now include worrying about scaling up front?