It is fairly obvious that web site performance can be increased by making the code run faster and optimising the response time. But that only scales up to a point. To really take our web sites to the next level, we need to look at the performance problem from a different angle.
Skype uses PostgreSQL as their backend database. PostgreSQL doesn't get enough run in the database world so I was excited to see how PostgreSQL is used "as the main DB for most of [Skype's] business needs." Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users.
First, code to insert a user in a database: CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS text AS $$ BEGIN PERFORM 1 FROM users WHERE username = i_username; IF NOT FOUND THEN INSERT INTO users (username) VALUES (i_username); RETURN 'user created'; ELSE RETURN 'user already exists'; END IF; END; $$ LANGUAGE plpgsql SECURITY DEFINER; Heres the proxy code to distribute the user insert to the correct partition: queries=# CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS TEXT AS $$ CLUSTER 'queries'; RUN ON hashtext(i_username); $$ LANGUAGE plproxy; Your SQL query looks normal: SELECT insert_user("username");- The result of a query is exactly that same as if was executed on the remote database. - Currently they can route 1000-2000 requests/sec on Dual Opteron servers to a 16 parition cluster.
Not sure if this is the right place to post this but here goes anyway. We are looking to hire an outside firm to help with development of a scalable and potentially high-traffic web site. We are not looking for an individual but rather a firm with enough well rounded expertise to help us with various aspects of this. Basic requirements: LAMP stack or other open source solution Very proficient in cross-browser web development Flex/AIR development for RIA Java/C/C++ proficiency Expertise with Comet and push server technology Experience with development of high-traffic web sites Use of Amazon Web Services infrastructure a plus If anyone knows of consulting firms that can take on such a project, I would appreciate your feedback. TIA
It's a sad fact of life, but processes die. I know, it's horrible. You start them, send them out into process space, and hope for the best. Yet sometimes, despite your best coding, they core dump, seg fault, or some other calamity befalls them. Unlike our messy biological world so cruelly ruled by entropy, in the digital world processes can be given another chance. They can be restarted. A greater destiny awaits. And hopefully this time the random lottery of unforeseen killing factors will be avoided and a long productive life will be had by all.
This is fun code to write because it's a lot more complicated than you might think. And restarting processes is a highly effective high availability strategy. Most faults are transient, caused by an unexpected series of events. Rather than taking drastic action, like taking a node out of production or failing over, transients can be effectively masked by simply restarting failed processes. Though complexity makes it a fun problem, it's also why you may want to "buy" rather than build. If you are in the market, Supervisor looks worth a visit.
Adapted from their website:
Supervisor is a Python program that allows you to start, stop, and restart other programs on UNIX systems. It can restart crashed processes.
Hi, I am building a video-sharing site and I'm looking for an efficient way to update video views count. The easiest way would be to perform an SQL update to increase the "views" counter every time a video is viewed, but naturally I want to avoid DB write access as much as possible. I am looking for an efficient temporary storage to which I could connect and say "increment views of video X". Every so often I would save the changes to my main database, and remove the counter from this temporary storage. I am having a hard time finding such temporary storage, however. My first thought was memcache, but it's not ideal as I wouldn't like to lose the data if memcache goes down. Also, memcache's increment command requires that the key is already present - that means that every time a video is viewed, I would have to check if the key already exists in memcache, before I can actually send the increment command. What do people use to solve this kind of issues? Kind regards, Tomasz
Jean-Paul de Vooght of our Switzerland contingent created a nifty little WidSets widget that lets you better read HighScalability from your mobile phone. I thought untethered readers might like to give it a try. Thanks to Jean-Paul for making it available! WidSets is: a simple service that brings you information normally accessed via the Internet by sending it directly to your mobile phone . Using mini-applications called widgets, it sends you the latest updates to your favorite websites. The system uses RSS feeds to push information from these websites directly to your mobile phone the minute they’re updated.
This post covers two main options for scaling-out MySql and compare between them. The first is based on data-base clustering and the second is based on In Memory clustering a.k.a Data Grid. A special emphasis is given to a pattern which shows how to scale our existing data base without changing it through a combination of Data Grid and data base as a background service. This pattern is referred to as Persistency as a Service (PaaS). It also address many of the fequently asked question related to how performance, reliability and scalability is achieved with this pattern.
For some special reason, I'm trying to make a web server able to get all the DNS names mapped to its IP. Let me explain more, I'm creating a website that will run in a web farm, every web server in the farm will have some subdomains mapped to its ip, what I want is that whenever my application starts on a web server is to be able to get all the subdomains mapped/assigned to that server, e.g. sub1.mydomain.com, sub2.mydomain.com. I understand that I have to use reverse dns lookup (i.e. give the IP get the domain name), but I also want to get all the subdomains not just the first one that maps to that IP. I've been reading about DNS on the internet but I don't seem to find any information on how to achieve what I want, normally you use dns to get the ip of a domain but I'm not sure that all servers enable reverse lookup. The problem is that I'm still not sure whether I'll host my own DNS server or use the services of some company (many companies offer DNS hosting services), so, my question is: - If I host my own DNS server, will it be possible to get all the subdomains using reverse lookup? Another question here, if I enable reverse lookup on my DNS server, can this have any negative side effects? As to security .. etc .. is there any way I can enable only my web servers to do reverse lookup while preventing anybody else on the internet from using reverse lookup? - If use the DNS hosting services of some company, will I be able to do what I want? ie. get the subdomains mapped to the IP address of a web server? Unfortunately I don't have much experience with working with web farms, so I would like also to ask whether every web server in the web farm gets its own static IP or how does it work? I mean you have the firewall ... etc .. so I don't know how IP assignments works in a web farm scenario .. Thanks a million in advance and sorry for my really long post .. Wal