advertise
Tuesday
Jan012008

S3 for image storing

Hi all, Has anyone got any experience with using Amazon S3 as an uploaded photo store? I'm writing a website that I need to keep as low budget as possible, and I'm investigating solutions for storing uploaded photos from users - not too many, probably in the low thousands. The site is commercial so I'm straying away from the Flickrs of the world. S3 seems to offer a solution but I'd like to hear from those who have used it before. Thanks Andy

Click to read more ...

Tuesday
Jan012008

HOW CDN works

All, I'm just new to this and have a basic understanding how CDN works? My questions are: 1. How does CDN sync data with web servers for video/images? If I have a user to upload a video to my site, will it get stored directly in CDN or it comes to my webserver first and then sync-ed with cache server? 2. How to have only the dynamic video/image delivered through CDN while the rest is served by a webserver? 3. How sync happens and who pays for the bandwidth for sync? I'd appreciate if someone could explain this. Regards, Janakan Rajendran

Click to read more ...

Monday
Dec312007

Product: collectd

From http://directory.fsf.org/project/collectd/ : 'collectd' is a small daemon which collects system information every 10 seconds and writes the results in an RRD-file. The statistics gathered include: CPU and memory usage, system load, network latency (ping), network interface traffic, and system temperatures (using lm-sensors), and disk usage. 'collectd' is not a script; it is written in C for performance and portability. It stays in the memory so there is no need to start up a heavy interpreter every time new values should be logged. From the collectd website: Collectd gathers information about the system it is running on and stores this information. The information can then be used to do find current performance bottlenecks (i. e. performance analysis) and predict future system load (i. e. capacity planning). Or if you just want pretty graphs of your private server and are fed up with some homegrown solution you're at the right place, too ;). While collectd can do a lot for you and your administrative needs, there are limits to what it does: * It does not generate graphs. It can write to RRD-files, but it cannot generate graphs from these files. There's a tiny sample script included in contrib/, though. Also you can have a look at drraw for a generic solution to generate graphs from RRD-files. * It does not do monitoring. The data is collected and stored, but not interpreted and acted upon. There's a plugin for Nagios, so it can use the values collected by collectd, though. It's reportedly a reliable product that doesn't cause a lot load on your system. This enables you to collect data at a faster rate so you can detect problems earlier.

Click to read more ...

Sunday
Dec302007

MySQL clustering strategies and comparisions

Compare: 1. MySQL Clustering(ndb-cluster stogare) 2. MySQL / GFS-GNBD/ HA 3. MySQL / DRBD /HA 4. MySQL Write Master / Multiple MySQL Read Slaves 5. Standalone MySQL Servers(Functionally seperated)

Click to read more ...

Friday
Dec282007

Amazon's EC2: Pay as You Grow Could Cut Your Costs in Half

Update 2: Summize Computes Computing Resources for a Startup. Lots of nice graphs showing Amazon is hard to beat for small machines and become less cost efficient for well used larger machines. Long term storage costs may eat your saving away. And out of cloud bandwidth costs are high. Update: via ProductionScale, a nice Digital Web article on how to setup S3 to store media files and how Blue Origin was able to handle 3.5 million requests and 758 GBs in bandwidth in a single day for very little $$$. Also a Right Scale article on Network performance within Amazon EC2 and to Amazon S3. 75MB/s between EC2 instances, 10.2MB/s between EC2 and S3 for download, 6.9MB/s upload. Now that Amazon's S3 (storage service) is out of beta and EC2 (elastic compute cloud) has added new instance types (the class of machine you can rent) with more CPU and more RAM, I thought it would be interesting to take a look out how their pricing stacks up. The quick conclusion:the more you scale the more you save. A six node configuration in Amazon is about half the cost of a similar setup using a service provider. But cost may not be everything... EC2 gets a lot of positive pub, so if you would like a few other perspectives take a look at Jason Hoffman of Joyent's blog post on Why EC2 isn't yet a platform for "normal" web applications and Hostingfu's Short Comings of Amazon EC2. Both are well worth reading and tell a much needed cautionary tale. The upshot is batch operations clearly work well within EC2 and S3 (storage service), but the jury is still out on deploying large database centric websites completely within EC2. The important sticky issues seem to be: static IP addresses, load balancing/fail over, lack of data center redundancy, lack of custom OS building, and problematic persistent block storage for databases. A lack of large RAM and CPU machines has been solved with the new instance types. Assuming you are OK with all these issues, will EC2 cost less? Cost isn't the only issue of course. If dynamically scaling VMs is a key feature, SQS (message queue service) looks attractive, or S3's endless storage are critical, then weight accordingly. My two use cases are my VPS, for selfish reasons, and a quote from a leading service provider for a 6 node setup for a startup. Six nodes is small, but since the architecture featured horizontal scaling, the cost of expanding was pretty linear and incremental. Here's a quick summary of Amazon's pricing:

  • Data transfer: The prices are $0.10 per GB - all data transfer in, $0.18 per GB - first 10 TB / month data transfer out $0.16 per GB - next 40 TB / month data transfer out, $0.13 per GB - data transfer out / month over 50 TB. You don't pay for data transfer between EC2 and S3, so that's an advantage of using S3 within EC2.
  • S3: $0.15 per GB-Month, $0.01 per 1,000 PUT or LIST requests, $0.01 per 10,000 GET and all other requests. I have no idea how many requests I would use.
  • Small Instance at 10 cents/hour:1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform. The CPU capacity is that of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.
  • Large Instance at 40 cents/hour: 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform.
  • Extra Large Instance at 80 cents/hour: 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform. You don't have to run these numbers by hand. To calculate the Amazon costs I used their handy dandy calculator. When performing calculations, per Amazon, I used 732 hours per month.

    Single VPS Configuration

    I was very curious about the economics of moving this simple site(http://highscalability.com) from a single managed VPS to EC2. Currently my plan provides:
  • 1GB RAM (with no room for expansion).
  • 50 GB of storage. I use about 4 GB.
  • 800 GB montly transfer. Of which I use 1GB/month in and 10GB/month out.
  • 8 IP addresses. Very nice for virtual hosts.
  • 100Mbps uplink speed.
  • Very responsive support. Very poor system monitoring.
  • 1 VM backup image. Would prefer two.
  • The CPU usage is hard to characterize, but it's been more than sufficient for my needs.
  • Cost: $105 per month. From Amazon:
  • The small instance looks good to me. What I need is more memory, not more CPU, so that's attractive. VPS memory pricing is painfully high.
  • 10 GB storage, 1 GB transfer in, 10 GB transfer out, 2000 requests.
  • Cost: about $80 per month Will I switch? Probably not. I don't know how well Drupal runs in EC2/S3 and it's not really worth it for me to find out. Drupal isn't horizontally scalable so that feature of EC2 holds little attraction. But the extra RAM and affordable disk storage are attractive. So for very small deployments using your standard off the shelf software, there's no compelling reason to switch to EC2.

    Six Node Configuration for Startup

    This configuration is targeted at a real-life Web 2.0ish startup needing about 300GB of fast, highly available database storage. Currently there are no requirements for storing large quantities of BLOBs. There are 6 nodes overall: two database servers in failover configuration, two load balanced web servers, and two application servers. From the service provider:
  • Two database servers are $1500/month total for dual processor quad core Xeon 5310, 4GB of RAM, and 4x 300GB 15K SCSI HDD in RAID 10 configuration, with 5 IP addresses. 10 Mbps Public & Private Networks. Public Bandwidth 2000 GB Bandwidth.
  • The other 4 servers have 2GB RAM each, single Quad Core Xeon 5310, 1 X 73GB SAS 10k RPM, for about $250 each.
  • For backup the cost is 500GB $200/month.
  • I'm not including load balancer or firewall services as these don't apply to Amazon, which may be a negative depending on your thinking. Plus the provider as an excellent service and management infrastructure.
  • Cost: $2700/month. From Amazon:
  • Two extra large instances for the database servers. Your architecture here is more open and could take some rethinking. You could just rent 1 and bring another another on-line from the pool on failure, which would save about $500 a month. I'll assume we load balance read and write traffic here so we'll have two. Using one extra large instance is about the same price as two large instances.
  • Four small instances for the other servers. Here is another place the architecture could be rethought. It would be easy enough to buy one or two servers upfront and then add servers in response to demand. That might save about $140/month under low load conditions. Adding another 4 servers adds about $300.
  • 300 GB of storage. Doubling to $600 GB only adds about $50/month. If storing large amounts of data does become a requirement this could be a big win.
  • 200 GB transfer in, 1800 GB transfer out. This is a guesstimate. Doubling the numbers adds another $400.
  • 40,000 requests. No idea, but these are cheap so being wrong isn't that expensive.
  • Cost: about $1300/month using two large instance for the database.
  • Cost: about $1900/month using two extra large instances for the database. The cost numbers really stand out. You pay half for a similar setup and the cost of incrementally scaling along any dimension is relatively inexpensive. You could in fact start much smaller and much cheaper and simply pay as you grow. The comparison is not apples to apples however. All the potential problems with EC2 have to be factored in as well. As someone said, architectecting for EC2/S3 takes a different way of thinking about things. And that's really true. Many of the standard tricks don't apply anymore. Deploying customer facing production websites in a grid is not a well traveled path. If you don't want to walk the bleeding edge then EC2 may not be for you. For example, in the service provider scenario you have blisteringly fast disks in a very scalable RAID 10 setup. That will work. Now, how will your database work over S3? Is it even possible to deploy your database over S3 with confidence? Do you need to add a ton of caching nodes? Will you have to radically change your architecture in a way that doesn't fit your skill set or schedule? Will the extra care and monitoring needed by EC2 be unacceptable? Is the single data center model a problem? Does the lack of a hardware firewall and a load balancer seem like too big a weakness? Can you have any faith in Amazon as a grid provider? Only the shadow may know the answers to these questions, but the potential cost savings and the potential ease of scaling make the questions worth answering.

    Related Articles

  • Build an Infinitely Scalable Infrastructure for $100 Using Amazon
  • An Unorthodox Approach to Database Design : The Coming of the Shard

    Click to read more ...

  • Wednesday
    Dec262007

    Golden rule of web caching

    Effective content caching is one of the key features of scalable web sites. Although there are several out-of-the-box options for caching with modern web technologies, a custom built cache still provides the best performance.

    Click to read more ...

    Wednesday
    Dec262007

    Finding an excellent LAMP developer

    Hi... I have this idea to start a really great and scalable website, and I am building it! So far I'm doing everything myself - coding, networking, architecture planning, everything. I haven't even gotten into the legal aspects yet....... It would be MUCH easier if I had a technical person to handle that end of the operation. I'm a good coder, but like Bill Gates at Harvard for Math, I'm not the very best. I'd like to FIND that very best person available, to handle the technical aspects. For worse or better, I don't presently know somebody who fits this bill. I've posted a bazillion ads on Craig's List, with no really qualified responses. I've put out feelers among my own network, same result. Not sure what else I can do. Shoestring budget, so it's sweat equity in the beginning. That can actually be a plus, as it forces people to focus. Any ideas about what else I can do, to attract the right person? Thanks Jason

    Click to read more ...

    Tuesday
    Dec252007

    IBMer Says LAMP Can't Scale

    A very entertaining and somewhat educational article on IBM Poopheads say LAMP Users Need to "grow up". The physical three tier architecture turns out to be the root of all evil and shared nothing architectures brings simplicity and light. In the comments Simon Willison makes an insightful comment on why fine grained caching works for personalized pages and proxy's don't: Great post, but I have to disagree with you on the finely grained caching part. If you look at big LAMP deployments such as Flickr, LiveJournal and Facebook the common technology component that enables them to scale is memcached - a tool for finely grained caching. That's not to say that they aren't doing shared-nothing, it's just that memcached is critical for helping the database layer scale. LiveJournal serves around 50% of its page views "permission controlled" (friends only) so an HTTP proxy on the front end isn't the right solution - but memcached reduces their database hits by 90%.

    Click to read more ...

    Sunday
    Dec232007

    Synchronizing Memcached application

    I have an application with couple of web servers that uses MemcacheD. How can i synchronize concurrent put to the cache? The value of the entry is list. Atomic append operation could have been helpful, but unfortunately memcahe doesn't support atomic append.

    Click to read more ...

    Saturday
    Dec222007

    This was a porn/spam post

    Seems as though anonymous users can edit old posts w/o any authentication. This post was loaded with spam/porn links. Now it is not. /anonymous

    Click to read more ...