I spend a blog entry discussing single partition and every partition transactions when using distributed KV systems and solutions for some common problems
We need to measure the number of queries-per-second our site gets for capacity planning purposes.
Obviously, we need to provision the site based on the peak QPS, not average QPS. There will always be some spikes in traffic, though, where for one particular second we get a really huge number of queries. It's ok if site performance slightly degrades during that time. So what I'd really like to do is estimate the *near* peak QPS based on average or median QPS. Near peak might be defined as the QPS that I get at the 95th percentile of the busiest seconds during the day.
My guess is that this is similar to what ISPs do when they measure your bandwidth usage and then charge for usage over the 95th percentile.
What we've done is analyzed our logs, counted the queries executed during each second during the day, sorted from the busiest seconds to the least busy ones, and graphed it. What you get is a histogram that steeply declines and flattens out near zero.
Does anyone know if there is a mathematical formula that describes this distribution?
I'd like to say with some certainty that the second at the 95th percentile will get X times the number of average or median number of QPS.
(Experimentally, our data shows, over a six week period, an avg QPS of 7.3, a median of 4, and a 95th percentile of 27. But I want a better theoretical basis for claiming that we need to be able to handle 4x the average amount of traffic.)
I've seen mentioned in few times sites like Digg or LinkedIn using graph servers to hold their social graphs. But the only sort of open source graph server I've found is http://neo4j.org/ .
Can anyone recommend an open source graph server?
Update: Good Vibrations by Radovan Semančík. Lot's of interesting questions about how Wave works, scalability, security, RESTyness, and so on.
Google Wave is a new communication and collaboration platform based on hosted XML documents (called waves) supporting concurrent modifications and low-latency updates. This platform enables people to communicate and work together in new, convenient and effective ways. We will offer these benefits to users of Google Wave and we also want to share them with everyone else by making waves an open platform that everybody can share. We welcome others to run wave servers and become wave providers, for themselves or as services for their users, and to "federate" waves, that is, to share waves with each other and with Google Wave. In this way users from different wave providers can communicate and collaborate using shared waves. We are introducing the Google Wave Federation Protocol for federating waves between wave providers on the Internet.
Here are the initial white papers that are available to complement the Google Wave Federation Protocol:
- Google Wave Federation Architecture
- Google Wave Data Model and Client-Server Protocol
- Google Wave Operational Transform
- General Verifiable Federation
The Google Wave APIs are documented here.
Mather Corgan, president of HotPads, gave a great talk on how HotPads uses AWS to run their real estate search engine. I loved the presentation for a few reasons:
This a really good example mix of where many companies are or would like to be with their applications.
Their total costs are about $11K/month, which is about what they were paying at their previous provider. I found this is a little surprising as I thought the cloud would be more expensive, but they only pay for what they need instead of having to over provision for transient uses like testing. And some servers aren't necessary anymore as EBS handles backups so database slave servers are no longer required.
There are lots more lessons like this that I've abstracted down below.
Site: http://hotpads.com - a map-based real estate search engine, listing homes for sale, apartments, condos, and rental houses.
* $150: 2 Small HAProxy Load Balancers - 2 for failover, these have the elastic IPs, round robin DNS point at the elastic IPs.
* $1,200: 3-5 Large Tomcat Web Servers - an array of 3 run at night and 5 during the day.
* $1,500: 5 Large Tomcat Job Servers
* $900: 1 X-Large 1 Large Index Server - used to power property search and have several GB of RAM for the JVM
* $1,200: 1 X-Large 2 Large MySQL masters
* $1,200: 1 X-Large 2 Large MySQL slaves
* $300: 1 Large Messaging Server ActiveMQ - will be replaced with SQS
* $300: 1 Large Map tile creation servers Tilecache
* $600: Development/testing/migration/ servers
* For a 67 KB object (600 px image) which is where the cost of putting an image into S3 equals the cost of storing it there and about equal the cost of storing it once.
* For a 6.7 KB object (15 px thumb nail) the put (small fee for putting an object into S3) cost is 10x the storage transfer costs.
* In April 330 GB of images downloaded at $.15/GB cost $49. 55mm GETs at $1/mm cost $55. 42mm PUTs at $1/1k cost $420!
* $100 download and GETs of maptiles.
* So S3 very cheap for larger files, watch out for lots of short lived small files.
* Makes frequently viewed listings faster.
* For infrequently viewed listings the CloudFront has to go to S3 to get the file the first time which means you have to pay twice for a file that will be viewed only once.
* Used on database servers because it's faster than local storage (especially for random writes), blocks of data redundant, and supports easy backups and versioning via cloning.
* Only 10% cost overhead.
* Allowed them to get rid of second set of slaves because the backups were so CPU intensive they had to have slaves to do the backups. EBS allows snapshots of running drives so the extra slaves are unnecessary.
* Databases are I/O bound and the CPU is vastly underutilized so there's extra capacity when you need it.
* 1 year for the cost of 6 months and guaranteed (denied one time) to get an instance.
* Con is tied to an instance type and they want more flexibility to choose instance types as their software changes and take advantage of new instance types as they are released.
So nice to start discussing cool things in this even cooler forum :)
I am having a problem .. which i believe is already solved but i would love someone confirming actual experience with the same topic.
We are building a client / server architecture, consisting of a web server part and many clients.
Transport will be provided as either XML-RPC / SOAP / JSON or all at once.
All of the communication has to be encrypted and passed within SSL3.
We expect a high load when the application starts (> 2000 concurrent requests).
Combine this with xml parsing for the rpc api, things really look ugly :)
So it's a big mess :)
It will not be that much database bound behind the api - mostly files will be transferred from the server to the clients and simple api for control.
So it's pretty much a matter of 'what-to-do-with-ssl'.
I was thinking of hardware - NetApp or a similar application accelerator.
Can anyone give examples of a hardware piece that combines: Load balancer / SSL accelerator?
I have also been reading about open source software Load Balancers but i really doubt it would meet the needs. Anyone having the same experience (or had) ? :)
Performance is critical to the success of any web site, and yet today's web applications push browsers to their limits with increasing amounts of rich content and heavy use of Ajax. In his new book Even Faster Web Sites: Performance Best Practices for Web Developers, Steve Souders, web performance evangelist at Google and former Chief Performance Yahoo!, provides valuable techniques to help you optimize your site's performance.
Souders' previous book, the bestselling High Performance Web Sites, shocked the web development world by revealing that 80% of the time it takes for a web page to load is on the client side. In Even Faster Web Sites, Souders and eight expert contributors provide best practices and pragmatic advice for improving your site's performance in three critical categories:
- Network - Learn to share resources across multiple domains, reduce image size without loss of quality, and use chunked encoding to render pages faster.
- Browser - Discover alternatives to iframes, how to simplify CSS selectors, and other techniques.
Speed is essential for today's rich media web sites and Web 2.0 applications. With this book, you'll learn how to shave precious seconds off your sites' load times and make them respond even faster.
About the Author
Steve Souders works at Google on web performance and open source initiatives. His book High Performance Web Sites explains his best practices for performance along with the research and real-world results behind them. Steve is the creator of YSlow, the performance analysis extension to Firebug. He is also co-chair of Velocity 2008, the first web performance conference sponsored by O'Reilly. He frequently speaks at such conferences as OSCON, Rich Web Experience, Web 2.0 Expo, and The Ajax Experience.
Steve previously worked at Yahoo! as the Chief Performance Yahoo!, where he blogged about web performance on Yahoo! Developer Network. He was named a Yahoo! Superstar. Steve worked on many of the platforms and products within the company, including running the development team for My Yahoo!.
This post include detailed on who is using the platform and how from Enterprise applicaitons, to ISV that are looking for SaaS enablement, through partners and solution providers that are looking for to gain a competitive advantage and deploy application in short time to market and small initial investment.
Update: Here's the first result. Good response time until 400 users. At 1,340 users the response time was 6 seconds. And at 2000 users the site was effectively did. An interesting point was that errors that could harm a site's reputation started at 1000 users. Cheers to the company that had the guts to give this a try.
That which doesn't kill your site makes it stronger. Or at least that's the capacity planning strategy John Allspaw recommends (not really, but I'm trying to make a point here) in The Art of Capacity Planning:
Using production traffic to define your resources ceilings in a controlled setting allows you to see firsthand what would happen when you run out of capacity in a particular resource. Of course I'm not suggesting that you run your site into the ground, but better to know what your real (not simulated) loads are while you're watching, than find out the hard way. In addition, a lot of unexpected systemic things can happen when load increases in a particular cluster or resource, and playing "find the butterfly effect" is a worthwhile exercise.
The problem is how do you ever test to such a scale? That's where Randy Hayes of CapCal--a distributed performance testing system--comes in. Randy first contacted me asking for volunteers to try a test of a million users, which sounded like a great High Scalability sort of thing to do. Unfortunately he already found a volunteer so the idea now is to test how many users it takes to find a weakness in your site.
If anyone wants test their system to the breaking point the process goes like this:
In the past test generators were fun to write, but it was always difficult to get enough boxes to generate sufficient load. Maybe you remember installing test agents on people's work computers in cubeland so tests could be run over night when everyone was sleeping?
The cloud has changed all that. Testing-as-a-Service is one very obvious and solid use of the cloud. You need load? We got your load right here. Spin up more machines and you can drive your site into oblivion, but not in a denial-of-service attack sort of way :-)
Randy has a nice write up how their system works in CapCal Architecture and Background. It's similar in concept to other distributed testing frameworks you may have used, only this one operates in AWS and not on your own servers.
Not everyone is Google or Yahoo with zillions of users to test their software against. If you are interested in testing your site please contact Randy and give it a go. And when you are done it would be fun to have an experience report here about what you learned and what changes you needed to make.
HotPads abandoned our managed hosting in December and took the leap over to EC2 and its siblings. The presentation has a lot of detail on costs and other things to watch out for, so if you're currently planning your "cloud" architecture, you'll find some of this really helpful.