Blogs

Clean or dirty, is that really the question?

I recently read “Clicking Clean, How Companies are Creating the Green Internet,” a very interesting report by Greenpeace published this past April. The report reviews the “clean” vs. “dirty” power usage by many of the Internet giants like Amazon, Google, Apple, eBay and others to run their vast data centers. The report shows what percentage of their power is from clean sources, such as solar or wind, vs. what percentage is from dirty sources like coal, gas or nuclear fired power plants. The report rates each company with grades ranging from “A” to “F” based on their renewable energy efforts.

But regardless of the type of energy used by these company’s data centers, an even bigger question might be: how do they reduce their energy consumption in the first place?

Why Copies of Data on Disk Alone Is Not a Good Active Archive Strategy

The father of theoretical computer science, Alan Turing, once said, “We can only see a short distance ahead, but we can see plenty there that needs to be done.” The same sentiment holds true in enterprise IT planning, considering that the average company keeps data for 15 years and some data requires indefinite retention. Unstructured data now represents the majority of data being stored, and this is exacerbated by the fact that more than 70% of disk capacity is mis-used[1]. So how do storage managers meet these challenges with decreasing annual budgets and the cost of storage representing between 33 – 70% of every dollar spent on IT?

Permanent Active Archives and the Cloud

In many industries archived data is considered to be the lifeblood of an organization. Broadcast and media, life sciences, oil and gas exploration, research institutes all create data that must be archived indefinitely. The information they create has significant value to each organization and so preserving that data in an active permanent archive environment makes good sense.

Storing archive data in the cloud, either using private or public cloud, is becoming a popular choice, particularly for long-term archives. Private cloud providers typically use an object storage solution that offers self-managing, self-healing technology, automatically recreating data on new media, should copies degrade. They continue to work as long as you keep adding new media to the storage pool. Public clouds typically offer similar functionality. They remove the responsibility for adding more media to a third party. As long as the user pays their monthly bill, data will be preserved forever.

Permanent Active Archives and the Cloud

In many industries archived data is considered to be the lifeblood of an organization. Broadcast and media, life sciences, oil and gas exploration, research institutes all create data that must be archived indefinitely. The information they create has significant value to each organization and so preserving that data in an active permanent archive environment makes good sense.

Storing archive data in the cloud, either using private or public cloud, is becoming a popular choice, particularly for long-term archives. Private cloud providers typically use an object storage solution that offers self-managing, self-healing technology, automatically recreating data on new media, should copies degrade. They continue to work as long as you keep adding new media to the storage pool. Public clouds typically offer similar functionality. They remove the responsibility for adding more media to a third party. As long as the user pays their monthly bill, data will be preserved forever.

However, due to the demise of some well-known public cloud providers (Iron Mountain Digital and Nirvanix for example), users of cloud are strongly advised by analyst organizations such as Gartner Group to create a cloud exit strategy before signing. Gartner Group published a guide in 2013 called “Devising a Cloud Exit Strategy: Proper Planning Prevents Poor Performance.” In addition, Henry Baltazar, senior analyst at Forrester Research said “one of the most significant challenges of cloud storage is the difficulty of moving large amounts of data from a cloud.”

To my knowledge, it has not happened yet, but inevitably a private cloud / object storage vendor will exit the market, stop trading or stop supporting their product at some time in the future. It is therefore equally important to plan for this eventuality. Take for example the difficulties and expense for users of the now “end of life” EMC Centera. A user can expect to pay around $2,000 per TB to migrate data out of Centera, and avoid bringing data back to their primary applications and re-archiving them to a new archive environment.

Active archive solutions can help in two ways:

1)      Incorporating an active archive file gateway solution with object storage or cloud separates applications from their archived data, allowing for simpler and more efficient migrations to take place controlled by the gateway. From the users / applications perspective, nothing has changed; data is moved from one archive technology to another, in background, without impacting the user workflow.   

Of course, the counter argument is that all we have done is move the problem of cessation from the object storage / cloud provider to the gateway provider.  Therefore the second method is perhaps preferable for permanent archives.

2)      Create hybrid active archives using object storage / cloud and a second store to low cost media (like tape) using an industry standard media format (like LTFS).

In this way, for a low additional cost, all archive data is preserved on a long-lasting, application independent storage medium, which offers a fast and efficient method of getting archive data into a new environment. Tape is very fast in reading/writing large data sets. This removes the need to migrate data OUT of anything, just stop the cloud service once the data has been written to the new archive.

There are, of course, other considerations for an active permanent archive, such as tape to tape migrations to keep old data on old tapes readable and using non-application specific file formats to ensure data can be read on new applications in the future, but those are topics for another blog.

It is a Wrap! NAB is Over but the Lessons Continue to Resonate

I recently spoke to a colleague who is a 20+ year veteran of attending the NAB Show. He has been through the ups and downs of the industry and show, and he told me that this year, he felt an energy at NAB that he has not felt in some time – the aisles were full with visitors coming to the booth more energetic than ever. I am a bit of a “veteran” myself, attending my first year in 1998. I appreciated my colleague’s feelings and would agree with him too, that the crowd felt alive. At this year’s show, the one common theme on everyone’s mind was archiving – what’s the best way to manage, archive and use data generated by media and entertainment.

Excitement Around Storage

The National Association of Broadcasters (NAB) is the world's largest electronic media show covering filmed entertainment and the development, management and delivery of content across all mediums. This conference is also the 3rd largest annual conference held in Las Vegas (behind The Consumer Electronics Show (CES) and World of Concrete).

This year, attendance jumped to 96,000 attendees—comparable to a small city’s entire population. And what an exciting event this turned out to be!

Tape Ensures Future of Active Archives

I was speaking with a customer recently about his storage environment and how he was incorporating the principles of active archiving. We were discussing the feature/benefits of active archiving and what elements actually make up an active archive and he asked me the following question: “Tape does not have to be part of an active archive, right?” I replied “By definition, no it does not…” and then came the “but.” 

Moving Towards a Converged Data Storage Platform

For almost a decade, the amount of data stored by companies, governments, universities and research labs has been escalating at an impressive rate. This momentum is even greater for Internet related activities. As the amount of data grows, it becomes increasingly interesting for strategic data mining. Company strategists, marketing and business development disciplines, intelligence and fraud detection agencies, and academic researchers seek to leverage stored data. They are mining it for information that might provide insight into social and cultural patterns, inform business decisions, or even catch fraudsters and terrorists.

New Use-Cases Support Active Archive Approach

Archive means different things to different users. What's clear is that Archival is not a specific workload, rather a category of workloads with the common trait of preserving data reliably for extended periods of time. Requirements can change from application to application such as sensitivities to cost and data integrity, but the requirement for scalability, low TCO, and longevity are universal.

Over the last decade, architectures such as object storage, HSM, and scale-out NAS systems have been deployed to serve primarily two archival use-cases; compliance, and preservation/reuse. In the compliance use-case, archival ROI was associated with risk mitigation; that is to say saving the digital communications and records of an enterprise, reduced litigation costs or risks associated with regulatory non-compliance. Archives deployed for preservation and reuse are typically associated with digital assets that have license or cultural value. The most notable examples are the multi-Petabyte archives found at major broadcasters, movie studios, and digital libraries where challenges of scale and longevity can approach extremes.

The Value of Tape for Active Archives

Active archiving is a key use case for tape. But how does active archive differ from backup? And how can tape-based active archive solutions help you reduce costs, save time and reduce risk?

Knowing Your Backups from Your Active Archives

It’s important to understand the distinction between backup and active archive strategies. Active archive and backup applications are distinct processes with different objectives and therefore impose different requirements on the storage systems that they utilize.