I read an article a few weeks ago in The Economist about test-tube data. It begins in a similar way to most stories we hear in the data and storage industry. Data growth is massive, becoming more and more unwieldy and expensive. IT budgets (and budgets in general) aren’t keeping pace with that growth. We in the industry have read (and lived) that story again and again and know it dearly.
But this is another story. A story about beer drinking, back-of-the-napkin machinations, and innovation. And a story about drinking and deriving. What happens when you put a couple of research wonks and real-world problems in a pub together? Magic.
In this case, what fell out was the seedling of an idea that data, this data that was growing exponentially and expensively, could be packed up and stored in artificially constructed DNA. Imagine storing data in a condense form factor of 2.2 PB/gram (that’s right, gram).
But wait, there’s more. Because, what else is DNA great at? Replication without error (or at least, with very few copying errors). This data fidelity from the “hard drive to the test tube” is accomplished with great elegance through a ternary encoding schema, chunking out the files into non-overlapping and overlapping segments, and then also baking in parity bits for error-detection.
Ah, but how does one decode the data, you ask? It’s a simple matter of utilizing s standard chemical reaction to generate multiple copies of the chunks and then interleaving them back together. Fun with chemicals, sign me up!
Of course there’s a catch. In this case, it’s the glacial pace at which data can be read back (it took these researchers 2 weeks to reconstruct 5 files). And, the other catch is cost. With this technology in its early stage, the cost/MB stored is around $12,000.
But seriously folks, this is cool stuff. Even as is, DNA-based storage techniques can still be suited for less intensive archiving scenarios where a medium is required that practically never degrades or needs replacing. The longer you need to archive your data, the more attractive (and practical) this methodology becomes.
Now, imagine what could happen if we put some serious computing power behind this data encoding/decoding problem to speed it up and bring the cost down?
EMC meet DNA? Hmmm…