Strategy: Saving Your Butt With Deferred Deletes
Tuesday, April 13, 2010 at 6:38AM
Todd Hoff in Strategy

Deferred Deletes is a technique where deleted items are marked as deleted but not garbage collected until some days or preferably weeks later.  James Hamilton talks describes this strategy in his classic On Designing and Deploying Internet-Scale Services:

Never delete anything. Just mark it deleted. When new data comes in, record the requests on the way. Keep a rolling two week (or more) history of all changes to help recover from software or administrative errors. If someone makes a mistake and forgets the where clause on a delete statement (it has happened before and it will again), all logical copies of the data are deleted. Neither RAID nor mirroring can protect against this form of error. The ability to recover the data can make the difference between a highly embarrassing issue or a minor, barely noticeable glitch. For those systems already doing off-line backups, this additional record of data coming into the service only needs to be since the last backup. But, being cautious, we recommend going farther back anyway.

Mistakes happen and James says in Stonebraker on CAP Theorem and Databases that:

Deferred delete is not full protection but it has saves my butt more than once and I’m a believer. If you have an application error, administrative error, or database implementation bug that losses data, then it is simply gone unless you have an offline copy. This, by the way, is why I’m a big fan of deferred delete.

Something to consider in your own design.

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.