« Stuff The Internet Says On Scalability For June 24, 2011 | Main | Running TPC-C on MySQL/RDS »

It's the Fraking IOPS - 1 SSD is 44,000 IOPS, Hard Drive is 180

Planning your next buildout and thinking SSDs are still far in the future? Still too expensive, too low density. Hard disks are cheap, familiar, and store lots of stuff. In this short and entertaining video Wikia's Artur Bergman wants to change your mind about SSDs. SSDs are for today, get with the math already.

Here's Artur's logic:

  • Wikia is all SSD in production. The new Wikia file servers have a theoretical read rate of ~10GB/sec sequential, 6GB/sec random and 1.2 million IOPs. If you can't do math or love the past, you love spinning rust. If you are awesome you love SSDs.
  • SSDs are cheaper than drives using the most relevant metric: $/GB/IOPS. 1 SSD is 44,000 IOPS and one hard drive is 180 IOPS. Need 1 SSD instead of 50 hard drives.
  • With 8 million files there's a 9 minute fsck. Full backup in 12 minutes (X-25M based).
  • 4 GB/sec random read average latency 1 msec.
  • 2.2 GB/sec random write average latency 1 msec.
  • 50TBs of SSDs in one machine for $80,000. With the densities most products can skip sharding completely.
  • Joins are slow because random access disk IO slow. Not true with SSDs. Joins will perform well.
  • Best way to save power because you need fewer CPUs. 
  • Recommends starting small, with Intel 320s. Don't need fancy high end cards. 40K IOPS goes a long ways. $1000 for 600 GB.

 Here's the video:

Well worth watching. The simplicity of not having to fight IO can be a real win. Some people have claimed SSDs aren't reliable, others claim they are. And if you need a lot CPU processing you'll those machines anyway so centralizing storage on fast SSDs may not be that big a win. The point here is that's it probably high time to consider SSD based architectures.

Related Articles 

Reader Comments (14)

Artur is not the first person I've seen become infatuated with SSDs, and I doubt he'll be the last. The profanity and the "can't do math" vs. "awesome" bit probably appeal to the other hipster pricks in the audience, but it doesn't make his arguments more convincing.

(1) You don't need that many IOPS for all of your data, so "SSD for everything" is a waste of money. Just a little bit of intelligence about which storage to use for which data can go a long way.

(2) Joins are slow because random access *in general* is slow, and single servers have limits that are soon reached with few SSDs. If you have multiple nodes for either performance or availability reasons, then your joins are going to be slow because of *network* latency even if disk latency drops to zero. This is true even for RAM latencies, and it remains true for SSDs.

(3) SSDs also have interesting write-block/erase-block boundary issues that affect performance over and above what we're already used to dealing with for disk blocks. Any *real* performance guru might have mentioned that.

(4) He doesn't even mention the longevity issue. If you treat SSDs as consumables, with a rigorous program of monitoring and replacement (good luck even keeping track of how many write cycles have occurred), then you can avoid the nastier failure issues . . . but those cost-per-whatever numbers don't look so good any more.

(5) Fsck is fast? Thank the folks who improved that code (I'm not one but I know who they are) because they had as much to do with it as SSDs did.

SSDs are great for warm data. They key is that they should as much as possible only be used for warm data - hot data should go in RAM and cold data should go on spinning/sliding media. It's not *that* hard to approximate such a pattern, and it's much more cost-effective than just dumbly slapping SSDs into everything. Over time, we might even develop algorithms that do this autonomously and semi-effectively, in contrast to the current crop of hybrid drives and auto-tiering drivers that burn write/erase cycles for data that won't actually be accessed again before it's evicted. Unfortunately, anybody who actually listens to this kind of "SSDs are magic pixie dust" BS won't be pursuing other, better, approaches.

June 22, 2011 | Unregistered CommenterJeff Darcy

Where exactly can one get "50TBs of SSDs in one machine for $80,000"?

June 22, 2011 | Unregistered CommenterPaul Nendick

Great response Jeff, thanks.

June 22, 2011 | Registered CommenterTodd Hoff

@Jeff: Re #4: Even the 25nm flash is still pretty reliable. If you have a layer that aggregates writes to a full flash block, you can get something like 820TiBibytes of writes to a single 160GB Intel 320 Series SSD.
(We're doing ~8.1TiB per percentage on the wearout indicator)

You can also pull all these stats (number of MiB written, percentage of spare flash and wearout) via SMART, at least for the Intel SSDs.

Doing monitoring with RRDs being written to 2 OCZ SSDs (using RAID0 mdadm), the estimated lifetime is ~21 years.

So flash endurance isn't a concern. Have a read of http://www.usenix.org/event/hotstorage10/tech/full_papers/Mohan.pdf

June 22, 2011 | Unregistered CommenterDaniel

Paul Nendick, by building something like this http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

June 22, 2011 | Registered Commentermxx

Obviously for reliability you don't want to run a single SSD drive in your server.
However, AFAIK no raid controller supports TRIM command, so drive's longevity and long term performance specs are a questionable.

June 22, 2011 | Registered Commentermxx

+1 for Jeff's post ... Lots of insight, zero hype :)

June 23, 2011 | Unregistered CommenterRussell Sullivan

@Paul Intel - SSDSA2CW600G3K5 - 600GB SSD 320 Series ~$1100

June 23, 2011 | Unregistered CommenterChris

Like Paul I'm also interested in the "50TBs of SSDs in one machine for $80,000" quote.

In the comment above Chris has said you can get the Intel 600 SSD for ~$1,100.

51,200 GB / 600 GB = 85 drives.

So for $90,000 (85 * $1100) I can get the SSD's for my 50TB, but what am I going to stick them in? @mxx pointed to backblaze's home grown storage solution, but like many others I'm not in the game of building my own hardware but would rather purchase from a provider.

The best option I can find is a Dell MD1220 direct attached storage which allows for 24 drives. Unfortunately, the biggest SSD they supply is 150GB giving a total of 3.6TB.

Does anyone know any better options?

Side note: we want to raid our SSD's as they can fail but we'll leave that out of this discussion.

June 23, 2011 | Unregistered CommenterBen Richardson

A SuperMicro SC417E16-RJBOD1 will do the trick - 4U, 88 x 2.5" bays.

June 24, 2011 | Unregistered CommenterPixy Misa

Excellent suggestions everyone.

Now what about securing this pool of data against drive failure? Last I checked, Intel's TRIM for RAID only supported RAID 0 or RAID 1 - nothing like RAID 5 or greater.

And once they have, what sort of RAID controller(s) can handle 88 SSDs? What would the usable amount of space be after being RAID'd? What would the rebuild time be after a single disk failure?

And what about a wholistic view of the throughput? If were to add in a pair of bonded 10 GigE ethernet ports to this JBOD, how much throughput could a smattering of NFS clients pull before the PCI bus inside this JBOD gets saturated? One can quickly negate the investment in SSD by mating that tech to other parts of a typical NAS or SAN stack that haven't evolved in performance at the same rate SSDs have.


PS: that Backblaze design isn't without criticism: http://www.c0t0d0s0.org/archives/5899-Some-perspective-to-this-DIY-storage-server-mentioned-at-Storagemojo.html

June 24, 2011 | Unregistered CommenterPaul Nendick

Perhaps forgo hardware raid controllers? Get JBOD enclosure with multiple independent sata ports, connect that thing to a dedicated storage server that be doing software raid. You'll get the benefit of TRIM command and not have to worry about enclosure's/controller's proprietary RAID setup/limitations.

June 24, 2011 | Registered Commentermxx

Software RAID is not an easy answer. CPU power might not be a problem but you have the extra bandwidth of all the duplicated IO to worry about when it is not offloaded to dedicated hardware.

July 5, 2011 | Unregistered CommenterBob

May I go one step further and say that anyone who demands RAID 5 is a moron. It's one thing to think you want something because you don't know any better. But to demand it when there is ample information that RAID 5 is a dinosaur is inexcusable. RAID 10 is fine.

July 30, 2011 | Unregistered CommenterTerris Linenbach

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>