Skip to main content

Backing up is for Dumbos

Sunanda recently lost some photos that she'd clicked on her holiday, and I asked what I thought was the obvious: "where did you back them up"?

A freezing glare followed. "Why", she asked me "should I have to back it up?".

Putting my foot firmly in my mouth, I persisted "its dumb to not back up! Technology 101!!".

"Pretty stupid technology, if you ask me" she said, marching out of the room.

This exchange got me thinking. In all my life as a technology manager (not to mention high priest of family photographs), backup has been are core mantra, but also a colossal the pain in the neck. The process takes too long, is never as current as you want and is invariably missing for the very day that your most important data was being written. Storage did indeed seem more than a bit stupid if so much time and effort was required to protect it from losing data. Technology should be smart enough to do invisibly what is essentially a core maintenance job; why should we need to allocate so much extra time, effort and money towards it? And it is much - about a fifth of every day, a bundle of licenses, a few operators and oodles of IOPS are consumed by the average company just for this - and its been that way since we left punch cards behind.

Lets go back to the basics. Why is backup needed at all? If you have a passive (let's not say dumb) storage system, backup does two things the storage cannot do for itself
  1. Protect against failures such as disk crash
  2. Undo accidental changes to the data by going back in a few days in time. 
Protection against failure was important - especially because disk crash used to be pretty common once upon a time (even a misplaced sneeze could do bad things). Today's disks, however, don't fail quite as much and there's plenty of recovery mechanisms when they do. Nutanix has built its core around keeping multiple copies of each block on multiple separate hardware nodes, ensuring a near-perfect availability. The creating and restoring of copies (pretty much what a backup is supposed to do) is continuous, invisible and instantaneous (no chasing IT with a service request). The only thing Nutanix won't cover is if the whole cluster or datacenter goes up in flames, which is not an entirely trivial problem to solve but I suspect no worse than what their Xi-Leap is anyway trying to do. Why then are we breaking our necks backing up something that isn't going to fail?

DISCLAIMER: Nutanix does not run on the pendrives storing Sunanda's photos.

This is, of course, going to be deeply uncomfortable for a lot of people. Backups are such an integral way of life for a generation of IT operators that no matter how rational, it is a hard habit to give up. Not having a recovery you can touch is scary. Then there are regulators to convince, some of whom mandate explicit backup requirements. As a basic principle, though - its a compelling one. Storage should handle all its backing up and recovering invisibly and continuously - and to the user should appear as never failing.

Accidental deletion or corruption is a more complicated matter; consistency, quiesence and all kinds of other tricky issues come charging in. These are, however, hardly unsolved problems. Snapshots have been around for a while and are quite effective at this - its just not invisible or automatic. Think of how life would be if you could just tell your storage to just snapshot itself every hour for upto thirty days - and be able to recover time-machine style back to ten minutes before the ransomware hit you. Why should this not be a basic feature of enterprise storage, given that I don't know of any enterprise that does not need it?

There's also the debate about where such versioning and snapshots should reside - at the OS level (file or object versioning) or at the storage level (block versioning). 

Here's my opinion. Versioning at file or object level is good for individuals but enterprises often need to restore whole snapshots; smart storage should offer a time machine at block level. Operating systems or application software can still offer file and object versioning but the big restores should come from the storage. Again, hardly a new idea - storage vendors have been peddling snapshot and restore capacity as an addon for ages; just that it is neither automatic nor invisible. 

Every enterprise needs this, and spends enormous time and energy today obtaining a poorer version of this need. Thats why I argue this is a core function of smart storage, not an optional add-on.

There's also another use backups are put to - long-term archive needs. This is where data are stored for years, even decades for forensic reasons. These are meant to aid investigations, disputes and forensic analysis rather than restoring for operational recovery. Should this become part of smart storage? I personally don't think so - archiving is a different need, generally requires a different kind of media and is meant for particular files and data sets rather than for the underlying storage. 


Comments

Popular posts from this blog

Outsourcing I–The "Why" Question

A little while ago, I was asked to give a presentation to CEOs on outsourcing. The audience wanted to know about adopting outsourcing for their companies; making use of its promise while avoiding its pitfalls. It seemed to me (unimaginatively, I must admit) that the whole thing boiled down to four fundamental questions - the why , the what , the who and the how . I decided to expand the presentation into a series of blog posts, one per question. The Why Question Why outsource? Given that a trillion-dollar industry has crowded a lot of people into Bangalore and made more than one driver rich, it seems a little late to ask this question. However, this isn't really about outsourcing being good or bad per se. Bloggers like us love to wallow in theoretical questions; companies usually want answers to more prosaic stuff. The question really is, why should a company be looking for an outsource partner ?   I've divided the universe into two simple flavours – Tactical and Str...

The Economics of 'E'

Mass market retailing is an expensive business. Rents, staff, inventory – the average brick and mortar retailer struggles along with barely visible net margins (spontaneous dancing is known to happen at 5%). With thousands of stores, hundreds of warehouses and over two million employees, Wal-Mart has in the last five years managed a profit margin of just 3.5%. The story is no different for any other major brick & mortar retailer, American or desi. Cool-kid-on-block Internet retail, on the other hand, thumbs a nose at the old-fashioned ways and gives the distinct impression that it can do much better. There's just one small problem. The bellweather Amazon, for all its buzz, seems unfortunately to have done much the same (indeed, a little less at 2.48% over the same period); nor has any other sizeable virtual retailer done much different. What gives? The law of unintended consequences, that's what. Lets take two of the most discussed items – rent and inventory. Mind you, thi...

Opening Windows

Walls between work and life have broken down. Companies have not noticed. Work and leisure used to be quite distinct once upon a time (and that wasn't so long ago either). Work was carried out at designated hours in the workplace, with tools that the employer provided; home was spouses, kids and paid vacations. Even where you carried work home, it usually meant a temporary exile to the kitchen table. Once, only artists and rockstars lived without such boundaries. This, of course, is long gone. Laptops and telecommuting started the blurring process a couple of decades ago, but things really went south with the advent of  the smartphone. Uniforms gave way to business formals, yielded to business casuals before finally jumping off the roof entirely when the flip-flopped dotcommers took over. Work texts were shoved between bites of dinner, treadmills served as venues for conference calls, angry birds flirted with corporate emails and social networking finally nailed all those coffins f...