Update 2: This seems to be a POF Peer1 love fest infomercial. It's pretty content free, but the production values are high. Lots of quirky sounds and fish swimming on the screen.
Update: by Facebook standards Read/WriteWeb says POF is worth a cool one billion dollars. It helps to talk like Dr. Evil when saying it out loud.
PlentyOfFish is a hugely popular on-line dating system slammed by over 45 million visitors a month and 30+ million hits a day (500 - 600 pages per second). But that's not the most interesting part of the story. All this is handled by one person, using a handful of servers, working a few hours a day, while making $6 million a year from Google ads. Jealous? I know I am. How are all these love connections made using so few resources?
Site: http://www.plentyoffish.com/
Information Sources
Channel9 Interview with Markus Frind
Blog of Markus Frind
Plentyoffish: 1-Man Company May Be Worth $1Billion
The Platform
Microsoft Windows
ASP.NET
IIS
Akamai CDN
Foundry ServerIron Load Balancer
The Stats
PlentyOfFish (POF) gets 1.2 billion page views/month, and 500,000 average unique logins per day. The peak season is January, when it will grow 30 percent.
POF has one single employee: the founder and CEO Markus Frind.
Makes up to $10 million a year on Google ads working only two hours a day.
30+ Million Hits a Day (500 - 600 pages per second).
1.1 billion page views and 45 million visitors a month.
Has 5-10 times the click through rate of Facebook.
A top 30 site in the US based on Competes Attention metric, top 10 in Canada and top 30 in the UK.
2 load balanced web servers with 2 Quad Core Intel Xeon X5355 @ 2.66Ghz), 8 Gigs of RAM (using about 800 MBs), 2 hard drives, runs Windows x64 Server 2003.
3 DB servers. No data on their configuration.
Approaching 64,000 simultaneous connections and 2 million page views per hour.
Internet connection is a 1Gbps line of which 200Mbps is used.
1 TB/day serving 171 million images through Akamai.
6TB storage array to handle millions of full sized images being uploaded every month to the site.
What's Inside
Revenue model has been to use Google ads. Match.com, in comparison, generates $300 million a year, primarily from subscriptions. POF's revenue model is about to change so it can capture more revenue from all those users. The plan is to hire more employees, hire sales people, and sell ads directly instead of relying solely on AdSense.
With 30 million page views a day you can make good money on advertising, even a 5 - 10 cents a CPM.
Akamai is used to serve 100 million plus image requests a day. If you have 8 images and each takes 100 msecs you are talking a second load just for the images. So distributing the images makes sense.
10’s of millions of image requests are served directly from their servers, but the majority of these images are less than 2KB and are mostly cached in RAM.
Everything is dynamic. Nothing is static.
All outbound Data is Gzipped at a cost of only 30% CPU usage. This implies a lot of processing power on those servers, but it really cuts bandwidth usage.
No caching functionality in ASP.NET is used. It is not used because as soon as the data is put in the cache it's already expired.
No built in components from ASP are used. Everything is written from scratch. Nothing is more complex than a simple if then and for loops. Keep it simple.
Load balancing
- IIS arbitrarily limits the total connections to 64,000 so a load balancer was added to handle the large number of simultaneous connections. Adding a second IP address and then using a round robin DNS was considered, but the load balancer was considered more redundant and allowed easier swap in of more web servers. And using ServerIron allowed advanced functionality like bot blocking and load balancing based on passed on cookies, session data, and IP data.
- The Windows Network Load Balancing (NLB) feature was not used because it doesn't do sticky sessions. A way around this would be to store session state in a database or in a shared file system.
- 8-12 NLB servers can be put in a farm and there can be an unlimited number of farms. A DNS round-robin scheme can be used between farms. Such an architecture has been used to enable 70 front end web servers to support over 300,000 concurrent users.
- NLB has an affinity option so a user always maps to a certain server, thus no external storage is used for session state and if the server fails the user loses their state and must relogin. If this state includes a shopping cart or other important data, this solution may be poor, but for a dating site it seems reasonable.
- It was thought that the cost of storing and fetching session data in software was too expensive. Hardware load balancing is simpler. Just map users to specific servers and if a server fails have the user log in again.
- The cost of a ServerIron was cheaper and simpler than using NLB. Many major sites use them for TCP connection pooling, automated bot detection, etc. ServerIron can do a lot more than load balancing and these features are attractive for the cost.
Has a big problem picking an ad server. Ad server firms want several hundred thousand a year plus they want multi-year contracts.
In the process of getting rid of ASP.NET repeaters and instead uses the append string thing or response.write. If you are doing over a million page views a day just write out the code to spit it out to the screen.
Most of the build out costs went towards a SAN. Redundancy at any cost.
Growth was through word of mouth. Went nuts in Canada, spread to UK, Australia, and then to the US.
Database
- One database is the main database.
- Two databases are for search. Load balanced between search servers based on the type of search performed.
- Monitors performance using task manager. When spikes show up he investigates. Problems were usually blocking in the database. It's always database issues. Rarely any problems in .net. Because POF doesn't use the .net library it's relatively easy to track down performance problems. When you are using many layers of frameworks finding out where problems are hiding is frustrating and hard.
- If you call the database 20 times per page view you are screwed no matter what you do.
- Separate database reads from writes. If you don't have a lot of RAM and you do reads and writes you get paging involved which can hang your system for seconds.
- Try and make a read only database if you can.
- Denormalize data. If you have to fetch stuff from 20 different tables try and make one table that is just used for reading.
- One day it will work, but when your database doubles in size it won't work anymore.
- If you only do one thing in a system it will do it really really well. Just do writes and that's good. Just do reads and that's good. Mix them up and it messes things up. You run into locking and blocking issues.
- If you are maxing the CPU you've either done something wrong or it's really really optimized. If you can fit the database in RAM do it.
The development process is: come up with an idea. Throw it up within 24 hours. It kind of half works. See what user response is by looking at what they actually do on the site. Do messages per user increase? Do session times increase? If people don't like it then take it down.
System failures are rare and short lived. Biggest issues are DNS issues where some ISP says POF doesn't exist anymore. But because the site is free, people accept a little down time. People often don't notice sites down because they think it's their problem.
Going from one million to 12 million users was a big jump. He could scale to 60 million users with two web servers.
Will often look at competitors for ideas for new features.
Will consider something like S3 when it becomes geographically load balanced.
Lessons Learned
You don't need millions in funding, a sprawling infrastructure, and a building full of employees to create a world class website that handles a torrent of users while making good money. All you need is an idea that appeals to a lot of people, a site that takes off by word of mouth, and the experience and vision to build a site without falling into the typical traps of the trade. That's all you need :-)
Necessity is the mother of all change.
When you grow quickly, but not too quickly you have a chance grow, modify, and adapt.
RAM solves all problems. After that it's just growing using bigger machines.
When starting out keep everything as simple as possible. Nearly everyone gives this same advice and Markus makes a noticeable point of saying everything he does is just obvious common sense. But clearly what is simple isn't merely common sense. Creating simple things is the result of years of practical experience.
Keep database access fast and you have no issues.
A big reason POF can get away with so few people and so little equipment is they use a CDN for serving large heavily used content. Using a CDN may be the secret sauce in a lot of large websites. Markus thinks there isn't a single site in the top 100 that doesn’t use a CDN. Without a CDN he thinks load time in Australia would go to 3 or 4 seconds because of all the images.
Advertising on Facebook yielded poor results. With 2000 clicks only 1 signed up. With a CTR of 0.04% Facebook gets 0.4 clicks per 1000 ad impressions, or .4 clicks per CPM. At 5 cent/CPM = 12.5 cents a click, 50 cent/CPM = $1.25 a click. $1.00/CPM = $2.50 a click. $15.00/CPM = $37.50 a click.
It's easy to sell a few million page views at high CPM’s. It's a LOT harder to sell billions of page views at high CPM’s, as shown by Myspace and Facebook.
The ad-supported model limits your revenues. You have to go to a paid model to grow larger. To generate 100 million a year as a free site is virtually impossible as you need too big a market.
Growing page views via Facebook for a dating site won't work. Having a visitor on you site is much more profitable. Most of Facebook's page views are outside the US and you have to split 5 cent CPM’s with Facebook.
Co-req is a potential large source of income. This is where you offer in your site's sign up to send the user more information about mortgages are some other product.
You can't always listen to user responses. Some users will always love new features and others will hate it. Only a fraction will complain. Instead, look at what features people are actually using by watching your site.
Related Articles
MySpace also uses Windows to run their site.
Thanks to Erik Osterman for recommending profiling PlentyOfFish.
Comments
How it catches up?
I understand how a website like YouTube can succeed with just word of mouth.
YouTube provided a free service to upload and share video. That service by itself makes the website attractive.
However, an online dating website main service is to connect with other people.
When that website is released there are very few people to search for.
Why would I sign for a contact page with only 200 contacts?
What is the value that keeps attracting people the first weeks?
Just word of mouth doesn't convince me.. mass emailing? purchasing databases?
Does somebody knows how can this really happen with just word of mouth?
How can this really happen with just word of mouth?
I wish I new. Why do good sites die when other lessor sites thrive? The answer has to do with Paris Hilton somehow, I am just not sure what it is.
why its so successful
Its about dating .. and u know where everyone hopes it leads to in their own different ways. The same why everyone gawks at Paris Hilton and her doings .. all u need is a reliable site and no interference and the lottery word of mouth ie someone won a million here and there
High Availability on IIS
"IIS arbitrarily limits the total connections to 64,000 so a load balancer was added to handle the large number of simultaneous connections."
I'm glad that was cleared up. When the author of plentyoffish made the original post about his setup, there was a lot of vagueness. He made claims in the comments that 64k was a hard number due to the number of ports in TCP. It took a lot of arguing (and I don't think he was ever convinced in that thread) to tell him that the TCP spec doesn't limit the number of possible TCP connections by the number of available ports -- there is a near infinite number of simultaneous connections. The bottleneck is how many the operating system and hardware can handle. :)
IIS is quirky if it simply limits the number of connections arbitrarily. It's almost as bad as the original idea that "64k is more than anyone will need" (paraphrased). And the idea that HDDs will never be bigger than 2GB. And so forth.
IIS does have one major advantage -- it is tied very closely to the OS it runs on. It probably enjoys the benefit of kernel-space IO which is much faster than user-space. It's hard to believe that no caching is being used at all; but it's not impossible.
Ultimately you can never tell. This guy could be lying through his teeth.
Reads or Writes
I'm not sure I understand the "do only reads or only writes".
How do I update the database without writes. Replicate changes to another database?
re: Reads or Writes
> How do I update the database without writes. Replicate changes
> to another database?
Yes. And in that database you can use a denormalized schema and more access oriented indexes. You also don't need to have triggers because the data should be verified on the write database. Maybe you can use different table types, caching policies, and other read optimizations as well.
its a TCP/IP limitation.
its a TCP/IP limitation. NAT only supports 64,000 connections per Source IP address. Its nothing to do with IIS its TCP/IP related.
http://www.foundrynet.com/services/documentation/sixl/slb.html
The SI supports a maximum of 64,000 simultaneous connections on each source IP address. This maximum value is based on the architectural limits of IP itself. As a result, if you add only one source IP address, the SI can support up to a maximum of 64,000 simultaneous connections to the real servers. If you configure 64 source IP addresses, the SI can support more simultaneous connections.
hosting your servers
For startup like Markus, what is the best hosting option (and grow more later)? host your own server or use ISP co-location option?
He still has to pay huge money on the bandwidth with that payload, right?
Re: DEATH TO POF
as far as Iam concerned pof really stands for plenty of fuckwits Marckus is a pathetic back stabbing prick , they never answer you questions they just delete you for no reason , they make up the rules as they go along , and when you do the right thing you are victimised for it , I had made some great friends , the pricks at POF deleted all my pics and threated me , my pics complied well within the rules , then the cock suckers deleted my profile and when i tired to contact them , yep you guessed it no answer , I realise iam not all that polite right now but my emails have been to them well not any more I think all the hackers should unite and murder this site out of its shitful existance , they have a thing called customer care , yeah well thats bullshit as well , the only thing they care about is fucking you over and over and over sad sad sad oh so sad that such small minds cannot maintain there sad site , they could not even maintain a garbage bin let alone earn the trust of the public , they are just a pack of pretenders , wannbes it really is a sad service thats not worth a pinch of shit well it really is Plenty of FUCKWITS
Re: DEATH TO POF
Yes you seem to be a likable person. To bad this sort of thing always happen to the nicest people.
Re: PlentyOfFish Architecture
Hey I found a website that sounds just like his lol. Its called plenty of torrents. Its for searching torrent files. What do you think? Would you rather date a creepy guy off there or download some good old porn. :)
here it is http://www.plentyoftorrents.com
I wonder if he knows about it or is going to sue them? lol
Re: PlentyOfFish Architecture
Wow, Markus made a really cheap solution for site with so high popularity.
Re: PlentyOfFish Architecture
He probably has a giant brain or something.
Re: Why do good sites die when other lessor sites thrive?
There is a cool book that can help answering the question: "Why do good sites die when other lessor sites thrive?". It's called Made to Stick, and the subtitle reads: "Why some ideas survive and others die".
The authors state that the 6 properties of a sticky idea are: Simplicity, Unexpectedness, Concreteness, Credibility, Emotive and Story.
As site ideas go I would add that the site must provide minimal barriers to entry: no sign up necessary for trying out the site and start off with a good ui interface. If we combine this with a carefully crafted idea (and a scalable architecture) word of mouth can take a site a long way. :)
As for POF... WoW! It is very cool to know that one (1!!) person can get such a high traffic site going! Congrats to Markus! :)
Re: High Availability on IIS
In re: to this comment: "IIS does have one major advantage -- it is tied very closely to the OS it runs on. It probably enjoys the benefit of kernel-space IO which is much faster than user-space. It's hard to believe that no caching is being used at all; but it's not impossible."
Read an older article I wrote about Apache vs. IIS, where I specifically address this:
http://techxworld.com/community/blogs/features/archive/2007/02/25/apache...
You can do something very similar to IIS and HTTP.sys with tools such as phhttpd.
--
Dustin Puryear
Author, Best Practices for Managing Linux and UNIX Servers
http://www.puryear-it.com/pubs/linux-unix-best-practices
Re: PlentyOfFish Architecture
Very informative article, great job!
There's one thing though I'm not really sure of, you mention that everything is dynamic, no static contents is served, how do explain then that every profile on the site has a .htm extension and no query string?
e.g.
http://www.plentyoffish.com/member1793945.htm
Everything else on the site has the extension .aspx though!
This seems like static to me .. no database trips or ASP.NET pipeline (ie. no ASP.NET objects are created for every request) just IIS serving static pages (with server includes though) and with file caching in Windows, you could have as many files cached and you won't even have to read them from the hard drive (let's say a file is from 30k to 50k average and with the amount of memory he's got in his servers - many gigas as you mention here - you can have all or most profiles cached), if you check the newest members on the site you'll find that they have a number more than 6 millions which means he's got more than 6 million members (so this means a little over 6 million files/profiles) .. or how else do you think you can serve millions of profiles without using any caching?! Just some thoughts of mine, I would be very interested to hear what you think...
Wal
Re: PlentyOfFish Architecture
you forgot about mod_rewrite. you have rewrite you query string into something more Search Engine friendly.
Re: PlentyOfFish Architecture
I understand that the URLs can be rewritten but just having everything in .aspx except the profiles has got to make you think .. of course I can't be sure of all of this, it's just a hunch that tells me this is done for a reason ..
Re: PlentyOfFish Architecture
Why waiting for S3 when there are many other CDN's out there with multiple geographic locations around the world :)
Re: PlentyOfFish Architecture
I'd assume that you're talking about your site :)
Re: How it catches up?
I agree. I think there is some hidden trick besides just world of mouth. I mean other free sites have been around longer but for some reason aren't as lucky http://www.oliveyou.net. I read in another blog he filled up the profiles with fake posts until it got rolling. I don't know if that is believable or not. What is the trick to get to the critical mass where it starts to feed upon itself?
Re: DEATH TO POF
How much did you pay for it? What have you lost? Think about and then consider how you would have handled the situation from the other side, but with 100 other people like yourself.
Re: Why do good sites die when other lessor sites thrive?
I also recommend the book "The Tipping Point"--- talks about mavens and mavericks, and connectors, and how viral ideas (memes) spread like the iPod and hipster loafers.
Space required for website like this
What do you think about Disk space for POF and Database storage space for POF ?
Is it in TBs ?
Re: DEATH TO POF
Maybe they deleted you because your illiterate? Isn't that in their terms and conditions? I would add that to my terms and conditions if I had people like you sign up on my site. Moron.
The thought that you take that much effort to vent about POF is ridiculous. Do you think we care? Really, do you? No. We're talking about server architecture.
Re: PlentyOfFish Architecture
I think you're deadon about the mod_rewrite. He would HAVE TO. This was probably a key point in increasing his SEO.
Hey can anyone recommend a good capcha text anti bot script like on this website below required for entering a response?
re: Reads or Writes
Hey Todd,
Can you break that down into English? I'm very interested in learning what you mean :)
Re: PlentyOfFish Architecture
Désolé. Mon anglais est très mauvais. Que vous n'avez pas compris ?
Re: PlentyOfFish Architecture
Cool article, got interested in this site and whats behind it after he posted pictures of that adsence cheque for like over $600,000.. jealous? ohh yeah.
Re: its a TCP/IP limitation.
Yes it's a limit of TCP/IP when you are using source-ip NAT and Foundry's source-ip feature which limits the number of distinct tcp/ip connections only because this architectural decision has limited the number of source ips.
But, it is not necessary to NAT the client IP using Foundry's source-ip feature. If you leave the client IP untranslated, you will have as many distinct TCP/IP connections as the server can handle. Most load balancing architectures do this - otherwise your server does not see the client ip which limits the usefullness of the server's web logs.
The 64000 connection limit is clearly a limit imposed the CHOICE to use the foundry source-ip feature in the web site architecture. To blame the limits on TCP/IP without making that crystal clear is misleading.
Re: PlentyOfFish Architecture
This article was the one of the most surprising for me among other.
In fact, I haven't believed that such a project based on Microsoft solutions could ever exist with so comparatively small amount of hardware involved in.
PlentyOfFish seems to somehow change my opinion on this topic...
Re: PlentyOfFish Architecture
imho POF is just a fluke, out of billions of websites your bound to get a few that will make a ton of money.
Re: PlentyOfFish Architecture
Well i guees that's owner of PoF is became lasy. And stopped improve site. And alternatives like Gimeney.Net will join tops soon. But anyway he did amazing thing.
Marcus Frind is a genius
You do know that Marcus is a mathematical genius as well as a date-site owner? Read this post about his discovery of the 23 primes in arithmetic progression - http://www.welcometowallyworld.com/plentyoffish-mate-or-date/2007/9/19/plenty-of-fish-the-23-primes-in-arithmetic-progression.html"
$600,000
Do be too jealous of that $600,000. One thing Markus always fails to talk about is how much of that goes back to Google for his advertising. I have no doubt he doesn't need a day job; but I don't find his income claims to be all that credible either.
Re: High Availability on IIS
No caching seems like a strange thing indeed. But ultimately - with 8 gigs of ram you can afford not to cache objects programmatically. The OS is maintaining the disk cache. So anyway you get the raw data from RAM, but then you have to convert it into HTML.
No caching means less implementation complexity, at the expense of some overhead for real-time conversion.
Re: PlentyOfFish Architecture
Mr. Markus, he is great and famous to public. We are Asian people know all about him and he is the best sample I personally admire and follow. Plentyofifsh help many singles locally and around the world to connect with each other on the Net.
Thanks very much for creating such a beautiful dating service.
Re: PlentyOfFish Architecture
you forgot about mod_rewrite. you have rewrite you query string into something more Search Engine friendly.
AMAZING Re: PlentyOfFish Architecture
I am amazed at how a simple idea like this can be so profitable. Plus, people seem to forget that a MS solution CAN work.
One thing that is disappointing about this article is that it seems like the .NET controls simply aren't efficient enough on one machine to handle that kind of load...as in, the creator of PoF had to use simple Response.Write methods.
Anyway, great article!
--
If you code, come hang out with us!
http://codershangout.com
Re: PlentyOfFish Architecture
Thanks for the great article. It was a nice read and obviously something to think about my new .net projects. Just wondering if its running on SQL2000 or SQL2005 , as SQL2005 is memory eater.
Re: PlentyOfFish Architecture
wot a load ov awd shit
Re: PlentyOfFish Architecture
you forgot about mod_rewrite. you have rewrite you query string into something more Search Engine friendly.
Re: PlentyOfFish Architecture
If I am not mistaken, PoF is using isapirewrite to do the URL rewriting
Re: PlentyOfFish Architecture
Your correct, it uses isapirewrite for the URL rewrites.
Re: PlentyOfFish Architecture
He DOES have a giant brain - he wrote his own prime sieve! A GOOD one!
More people have gone up in the Space Shuttle than have written a good prime seive!
Re: PlentyOfFish Architecture
You quoted "RAM solves all problems. After that it's just growing using bigger machines." in your last paragraphs.
What exactly is growing using bigger machines. Is it the architecture or the organization?
-----
Underwater sea plants
Seaweed Easy aquarium plants
Re: PlentyOfFish Architecture
Thanks for sharing your thoughts on technology...
I have small website in my basement; 3.2Mbps upload (MxDSL, 4 telephone lines, async) can theoretically handle 3,500,000 pageviews a day.
(3.2Mbps = 34,560,000,000 bytes-a-day which will give 3,456,000 pageview-a-day with 10k gzipped content)
Simplest improvement could be to use HTTP-Caching enabled servers outside of my basement... Akamai is good.
Re: PlentyOfFish Architecture
I run a small new dating site, I only wish I was this big.
Re: PlentyOfFish Architecture
Thanks for putting up such useful information and good to see .Net platform is much powerful and supports such huge traffic.
muscle
wow thats hard to believe that one person can handle such a high traffic site and also only a few servers can handle such demands.
Gas
That is truly an example of how a lot of intelligence on how to work smart with little overload and resources can make you competitive with the big boys.
Post new comment