Blogs
Blurring the Lines
Submitted by Dave Thomson on Tue, 01/31/2012 - 08:57
Recently there have been a number of product announcements from backup vendors stating the virtues of using backup applications for archiving. One of the main reasons the Active Archive Alliance was formed was to better educate organizations about archiving and to explain why an Active Archive is superior to backup for archiving.
Let’s look at the ways an Active Archive secures data for medium- to long-term retention periods, compared to using a backup product for archiving.
An Active Archive is a self-supporting system that securely stores and maintains data independently of primary storage and backup. Active Archives do not replace backup, the solutions are complementary, working together to manage data.
Backup is designed to allow restoration of data to primary storage and servers in the event of an outage, data loss or user deletion. A backup can be a full or an incremental backup (with periodical full backups). Either way, considerable numbers of files are copied (backed-up) during each operation, so speed is of the essence. One way backup companies improve write speed is to remove data verification during writes. This is acceptable in a backup operation, as it is always possible to look at an alternative backup for the data, for example the previous day’s backup. The intervals between backup copies are determined by the organization’s tolerance for data loss, whether minutes, hours, or days.
In an archive environment, files are moved from primary storage to the archive, optionally leaving a stub behind. Before the file is removed from primary storage it is vital that the validity of the file, now only in the archive, be checked, therefore all data entering the archive is verified during the write operation.
If an organization is using tape for their backup they will use a tape rotation scheme (“Grandfather, Father, Son (GFS)”, “Tower of Hanoi” etc). In a GFS rotation scheme, with a daily backup including weekends, at least 22 tape sets are needed in a year. With Tower of Hanoi it can be 10 sets. In both examples, this excludes the replacements of worn out media.
An Active Archive will typically have a maximum of two or three copies of the data on tape. This offers considerable savings in media costs and reduces the total amount of managed data, while maintaining the integrity of the archive.
An Active Archive builds in future proofing by helping organizations avoid costs associated with vendor lock-in and deliver a flexible, worry free migration strategy to new storage and/or new platforms.
When using backup to archive, a proprietary stub is left on primary storage, or dedicated disk archive, pointing to a location in the backup. In QStar’s opinion, a proprietary stub will, in the future, lead to huge clean-up operations resulting in unwanted costs and disruption, when the organization wishes to switch providers. The only way to migrate data is to bring everything back to primary storage before writing the archived data out again to the new archive.
An Active Archive allows for migration of archived data over time, for example, migrating data from older LTO2 media to LTO5. All Active Archive applications support this. Data can be migrated from one technology to another without bringing it back to primary storage. Disk storage systems have an average life of 3 to 5 years, while tape libraries have an average life of 6 to 10 years, allowing multiple tape generations to be used over the lifetime of the library. Archived data can often have a desired retention time of decades, so the ability to migrate data, in those environments, must be planned for.
Finally, more and more Active Archive applications can optionally use the new Industry Standard Linear Tape File System (LTFS). With LTFS, media can be read by other archiving software, allowing media reuse, migration of data to new solutions on new platforms.
Backup is a necessary evil, securing constantly changing and often accessed files on primary storage. However, backup methods also create a significant amount of redundant data. Once data on primary storage becomes older and less frequently accessed, moving it into an Active Archive tier is the best approach to reduce cost and secure data for the medium- to long-term, removing it from the repetitive backup process.
Active Archive: Flexibility, Performance, Affordability, Ease of Use
Submitted by Chris Marsh on Tue, 01/24/2012 - 18:39
For many years during the “tape wars” era, as I’ve come to call them, when most major non-tape vendors were attacking the technology, companies like Spectra Logic often found themselves on the defense. If you follow any of the dialogues on LinkedIn or other forums, there has been a common theme that the only value proposition tape offers is cost. There is also opposition to this opinion, thus creating what Chris Mellor at The Register describes as a “religious war” between technologies. Instead of firing another arrow amidst that war, I’d like to take a step back and take a look at why active archives are resonating so well with resellers and customers alike: Flexibility, performance, affordability and ease of use.
Active archives combine the best advantages of many technologies, which is why software, tape and disk vendors alike are joining the growing movement. With the data volume and retention requirements of most archives, tape technologies provide some key benefits and are one of the big reasons why active archives are so appealing, however tape is only one piece of a well architected active archive. Historically, archives and especially archives to tape, have had a reputation of being hard to use, cumbersome, unreliable: in other words a headache. Today, archives to tape are the healer of headaches – not the creator. These issues are not inherent to tape; rather, they are symptoms of problems that customers need to address. The words data migration, media format changes, full restores, lost data, and backup failures all evoke negative connotations at best. The true culprit of these pains is the data management process, or lack thereof, which active archives address with both short and long-term solutions.
When it comes to infrastructure architecture, flexibility and performance are king with cost as the regulator. These are the benefits that an active archive delivers, by offering a new approach to data management, rather than simply an updated single product with new features. Active archives take the approach of offering storage and archival features that can be tailored to the specific needs of individuals, ensuring the short-term storage and long-term retention needs specific to that organization’s data are met. This is because active archive is not a single product being promoted by a single vendor or even a single market.
Active Archives as NAS
It’s understandable that people initially mistake active archive for storage tiering or HSM…however, it’s much more than that. Active archives allow any storage medium to be used as NAS storage, in the form of a CIF or NFS share. When combined with open formats, it allows a company to architect its systems in a vendor-agnostic way, allowing the use of the most appropriate product for its specific needs. Migration is no longer a major undertaking, but simply a hardware upgrade and adjustment of policies. If a technology becomes obsolete, the data can be easily migrated to a new system. If a company fails, likewise the data is not compromised. And much to everyone’s relief, as technology evolves, data can be automatically moved onto new platforms. This prevents the nightmare of realizing that you have large amounts of data sitting on obsolete equipment or formats, because active archives proactively migrate data onto newer equipment as the system changes over time.
Performance seems to be where we all get snagged. Is tape slow, or is it fast? Random access, linear access--there are many ways to represent any technology as fast or slow. Active archive properly sidesteps the I/O battle. It simply takes advantage of the equipment implemented and the policies for where and how data is retained. Regardless of where data resides, it is accessible. This doesn’t mean that you should move transactional, high performance data immediately to tape. It means you can set realistic expectations for data retrieval times, and at no point does an IT administrator have to manually restore data to get it back, provided it is at least in a library or connected via a WAN. The performance of the system is left to the storage devices implemented and the policies around the data management application. SSD and high performance disk should be a part of a well-designed active archive to meet performance needs.
With performance and flexibility addressed, we move on to cost and ease of use. The active archive advocates, of whom I am one, have hit hard on the cost advantages of an active archive. For today, I’ll briefly note that an active archive is simply less expensive than other strategies, both in capital expense and operational expense, because it uses less expensive storage platforms for data that would traditionally have resided on higher cost systems to maintain accessibility.
All that remains is ease of use. To the user, active archive is an automated system; all files are accessible; older or archived data simply takes a little longer to retrieve. From the administrator’s perspective, active archives need to be properly set up and tuned to optimize performance for the environment. However, once configured, the administrator is not needed for file retrieval, and can easily set up a migration without having to take the system down or spend weeks on planning. Also, in a disaster, data is accessible over a WAN if it’s stored on a live DR site, or can be rebuilt from offsite tapes. In the event that a single system goes down, retrieval is simply dependent on the performance of the device that holds the second copy.
IT administrators can manage their data more easily and in less time, instead of allowing their data problems to manage them. Components can be upgraded or replaced without overhauling the entire active archive. Thus, active archives are flexible, perform well, are cost effective, and are much easier to administer than other storage strategies—no outsourcing required.
The Cost Advantage of Tape
Submitted by Rich Gadomski on Wed, 01/18/2012 - 09:15
In my previous blog posts, I talked about the advantages of data tape in terms of reliability and capacity and how tape plays a crucial role in supporting active archive systems. While these factors are key considerations in the merits of leveraging tape for tier 3 storage, cost effectiveness is a major factor that favors tape as well.
Leading analysts project that organizations will need to grow their data storage capacity dramatically in the coming years as a result of the explosion of unstructured file data, regulatory compliance and the need to keep data for longer periods in active archive mode. The numbers vary, but the consensus is around 50% data growth annually (there’s no recession in data creation!) Yet, IT budgets are barely increasing, so close attention is being paid to storage-related investments that can consume significant portions of CAPEX budgets. With OPEX budgets multiplying acquisition costs by several factors, it becomes clear that storing all data on disk drives is cost prohibitive from both an acquisition and an operations point of view.
In addition, the price of electricity to power and cool disk storage continues to climb, and some areas of the power grid are already over taxed, creating the problem of simply supplying the needed power to data centers.
TCO studies from leading analysts show tape systems cost less than disk. The Enterprise Strategy Group reported a 2-4X cost advantage in backup applications using LTO-5 compared to disk with de-duplication.
For long term archiving, The Clipper Group did a detailed TCO study and reported a 15X cost advantage for LTO-5 tape vs. disk in an archiving application over a 12-year period.
In yet another recent TCO study done by the Information Storage Industry Consortium, disk system acquisition prices turn out to be 9X more than the equivalent tape system for 500TB of storage over a five-year period.
When it comes to power consumption, tape is far greener than disk and this is where the real cost savings are to be found. The TCO study from The Clipper Group shows that disk consumes at least 238X more power than tape as data on tape consumes little or no energy, and tape does not require the significant energy associated with cooling spinning disks. In fact, Clipper showed that the cost of powering the disk solution over the 12-year period is the same as the cost of an entire tape solution including hardware, media and power!
Undoubtedly flash and disk technologies play a critical role in active archive systems for certain data applications, for example where rapid access time is important. But once again, studies show that anywhere from 60 to 90% of data is rarely accessed after 30 days. So it makes sense to move data from more expensive tiers of storage to the more cost-effective tape tier. And with active archive systems, the data remains accessible. Data storage that is efficient and always available – it’s the best of both worlds!
The Smart First Step For Big Data
Submitted by Janet Lafleur on Tue, 12/13/2011 - 10:25You’ve probably heard the buzz about Big Data, and stories of data mining technologies (i.e. Hadoop’s solutions) that take business analytics to the HPC level, making it possible to make sense of massive quantities of unstructured data. And you may have heard the buzz from storage vendors about how their highly scalable disk platforms enable all this number crunching.
But if all this data has reached a magnitude to now be branded Big Data, does it make sense to keep it all on disk? And what about long-term storage of the source data that not only drives the analysis today, but is likely to be needed again at some point in the future? How many spinning disks does it take to store all the interactions of consumers on the Web or to store years of satellite images at a half-meter GSD resolution?
The unsurprising answer is that Big Data requires scalable active archives to go along with the scalable disk storage systems. Big Data does not live on disk alone. With a lower-cost, long-term media like tape and intelligent software, active archives can deliver large data sets as needed, when needed to the high-performance disk storage systems for the intense number crunching. And when the analysis is complete, an active archive can preserve the results for the future. That’s a no-brainer.
But another significant characteristic of Big Data is that much of the source data is fixed content—data that never changes. Consider transaction log files from a bank, satellite images in weather research and raw footage from the movie set. These files are fixed content from the moment they are created, and when handled properly will never be modified. As fixed content, these files should be preserved in an archive not just to conserve disk space, but because they are irreproducible.
So my advice to those responsible for those managing Big Data is to do what a number of Atempo customers are doing with their raw data today. First, archive all your raw data sets onto a low-cost media like tape as soon as they’re created, capturing and indexing the relevant metadata so you can search and retrieve it later. Make a second copy and send it offsite while you’re at it.
Then, if the data sets aren’t needed right away, remove them from the high-performance storage. When needed for analysis, the data sets can be retrieved quickly and easily from the active archive through the file system or via search. Your raw data sets will be secure and immediately available without crowding your expensive high-performance disk systems.
Archiving raw data sets is only one way that folks managing Big Data environments can reduce their Big Data management headaches. At almost every step in analytical workflows there are opportunities to manage data better through active archiving. By taking that first step of archiving raw data sets, you’ll get your Big Data strategy off on the right foot.
Note: Image from http://flowingdata.com/2010/08/17/stacked-area-shows-the-web-is-dead/
The Data Armageddon: Time to Learn What You Dont Know
Submitted by Jim McKinley on Fri, 12/02/2011 - 11:52
When Thomas Gray inked the phrase, "Ignorance is Bliss, 'tis folly to be wise," I don’t think he considered how best to manage data in our present-day data Armageddon. If you are a data manager and you adhere to the "ignorance is bliss" school of thought, I would recommend that you refresh your resume immediately!
I have spoken with too many people who have no idea of what is to come concerning the world’s rapid and exponentially growing data. Believe it or not, I talked to a person at the Supercomputing show in Seattle who said they are actually moving all their data to disk and neglecting the tremendous, inherent values and benefits (low cost, high capacity and performance, to name a few) of tape. As their data doubles each year, which he said it does, the plan is to continue adding more disk... Really? In his case, I believe he really thinks ignorance is bliss. I offered to share with him how customers with hundreds of terabytes to hundreds of petabytes are managing data with intelligent file systems and using both tape and disk in cost efficient ways and he refused to listen because his ignorance has caused him to believe that "tape is dead". Granted, I don’t hear this very often anymore because the HPC community, as a whole, is paving the way for a cost-effective tape-based storage concept we will discuss later, called "Active Archive".
First, I want to address the ignorance of the individuals who have sipped the "tape is dead" Kool- Aid from certain disk vendors over the past 10 years. Growing up as a teenager in the great state of Texas, I listened to AM radio in my first pickup truck. (Yes, all it had was an AM radio!) Anyway, one of my favorite radio talk shows was Mr. Earl Pitts, who addressed controversial topics and would start by sharing his straightforward opinion on them by saying (insert Texas accent)"Ya know what makes me sick, you know what makes me so angry I could spit?"… or something along those lines. (http://www.youtube.com/watch?v=4DDhrRooNp4) Then he would talk about something that is usually contradictory to the American way since he was a patriot who was always watching out for our true, red-blooded American values. Well, I feel sort of like Earl when someone tells me that they think that tape is of no value, which simply shows their ignorance. I want to say “you know makes me sick, you know what makes me so angry I could spit?".....Ignorance! He would always end his lesson on values and truth by saying “Wake up America!” Well, when someone tells me “tape is dead”, I want to grab them, shake them and say “Wake up!”
The reality today, regarding data storage, is that it is not folly to be wise and it is not bliss to be ignorant. Wake up Storage Admins! I have to admit that the number of people I talk to around the country at trade shows, in meetings, etc., are awake and aware of the ever present danger of data explosion. So, needless to say, my blood pressure stays in check and I don’t get angry as often. I try to keep things in perspective and just assume that they simply don’t know what they don’t know.
My job, and that of my colleagues, both at Spectra and within the tape industry overall, is to educate as many people as possible about how to reduce the cost, complexity and fear of managing exponentially growing data. Spectra is leading the charge to create an awareness of how valuable tape can now be in the data center. Tape is no longer used just for backup. It was great to see so many of our HPC customers at SC11, most of whom don’t even use the terminology of “backup” any longer. As tape continued to mature over the last 10 years by getting 700% more reliable, faster and more dense, many of our HPC customers started leveraging the benefits of tape in what we call an “Active Archive”. In other words, they are using tape as disk. An active archive is a combination of open system applications, varying types of disk, and tape hardware that intelligently monitors and migrates data across multiple storage devices while maintaining fast user accessibility. Traditionally, in the backup world, one could only access tapes and the data on them through a proprietary backup application such as NetBackup, Legato, Commvault, etc. I’m not advocating that corporations discontinue backups all together because one should always have a “second” copy of data in the event of a disaster. However, the premise of an active archive is that all data can be online all the time.
Obviously, when someone has hundreds of terabytes or even petabytes, it is cost prohibitive to try and keep all data online all the time in the traditional way of keeping it all on primary or secondary disk. With an active archive file system, the data can be dynamically distributed across multiple storage platforms including disk and tape. Policies can determine where data is at any given time and it is transparent to the end user where that might be. They simply have a drive letter and directory with all their files as normal. Nothing proprietary about access to their data—anytime they need it. By extending a file system across high performing disk, capacity disk and now tape, the need for IT intervention to retrieve an archived file is minimized, if not eliminated. This data management approach is being used by many of our HPC customers and they are benefiting tremendously by having a searchable, compliant format to store data for the total lifecycle of a file based on policies, industry regulations and laws.
I could go on about the benefits of active archive or the inherent values that are characteristic of the tape technologies of today, but I would rather provide some links to more information on both so you can continue your own research and put aside any tendencies you might have to subscribe to the “ignorance is bliss” philosophy! Tape is here to stay and is poised to solve your storage headaches today and in the future by offering greater efficiency, better reliability and maximum performance. So wake up! Data Armageddon: tape’s got this one.
Active Archiving and LTO tape DRIVES the Cloud
Submitted by Andy Richards on Tue, 11/08/2011 - 18:03

It is interesting to see all the different ways that LTO tape based solutions have been used in conjunction with best-of-breed active archive solutions, while remaining below most people’s radar screens. Thanks to the Active Archive Alliance we are now bringing recognition to these innovative approaches which will only help to further promote the benefits of active archiving and tape. Below I discuss how LTO tape and active archiving is enabling an emerging solution for cloud-based medical record and image archiving.
Telepaxx Medical Archiving is one of the leading vendors in Europe of PACS vendor neutral archiving (VNA). For over 10 years, they have provided DICOM based cloud storage for healthcare customers in need of long term storage of medical images. Recently, they teamed with GRAU DATA, an Active Archive Alliance Contributing Member, to complement their cloud offering by providing a gateway for file based archiving into the Telepaxx cloud. This allows file based solutions like content management, email archiving and Electronic Health Records to be stored in the same infrastructure as DICOM based medical images. Now healthcare customers have a single consolidated archive for all their enterprise data, in a secure cloud based archive.
The success of this solution over the years has been Telepaxx’s ability to store data in a secure, private and cost effective manner. It is the key attributes of LTO tape that has enabled the success of this solution over the years:
Removability – Allows extra copies of encrypted images to be stored in secure off-site vaults.
Customer Privacy – Individual tapes for each customer which insures each medical institution’s data is maintained separately.
Low power consumption – Facilitates managing multiple petabytes of images with a small energy footprint.
Future proof – Medical images are stored for extended time periods, longer than the usable lifetime of any storage technology. LTO’s roadmap to higher tape capacities and performance coupled with an automated forward migration capability means the image is preserved for its useable life on the most recent generation of LTO.
Cost Effective – The lowest dollar per GB of any storage technology enabling customers to utilize cloud storage at the lowest possible cost.
High Scalability – Provides a small footprint in a data center while scaling into the multiple petabytes range.
Capacity on demand – Facilitates capacity expansion with minimal hardware investment.
It is all of the above key attributes of LTO technology coupled with the active archive solutions which has enabled Telepaxx to offer such unique capabilities to their customer base. We are sure that more companies will be offering tape-based cloud services based on all the benefits mentioned above. Telepaxx’s successful deployment of this strategy for over 10 years has demonstrated that an archive can outlive the useable life of an individual storage technology, without impact to the customer or applications using the storage. Only the combination of active archive with LTO tape can provide the benefits of a cost effective, future proofed cloud storage solution.
Another Way to Bell the Cat
Submitted by Charles Whinney on Mon, 10/24/2011 - 14:21My thanks to Dr. Mujamder for his input regarding our ongoing discussion around the topic of what constitutes an active archive solution and how that solution applies to many issues facing IT administrators today. He brings up many good points, including the complexity and challenges of using metadata to automate a tiered storage process for structured content. I’d like to address some of his points from the perspective of a vendor member of the Active Archive Alliance.
First, Dr. Mujamder accurately describes the steps in a standard archival process as identify, transform, archive, and restore. Moreover, he states that an active archive process differs from a standard archive process because it “offers reliable, online, and efficient access to data.” I underscored the word “online” in Dr. Mujamder’s quote because it signifies a very important distinction that has a dramatic impact on how an active archive can be utilized in a much more efficient process than his diagram illustrates. Think about it. If data remains “online,” there is no need to ever restore it. In other words, an active archive approach can eliminate the transform and restore steps in a standard archive model simply by using a storage virtualization solution that provides online connectivity for applications and users.

Products such as FileTek’s StorHouse® platform effectively archive, retrieve, and manage unstructured data (stored in native file format) and structured data (stored in database format), thereby eliminating the time-consuming transform and restore steps for both content types. Because active archives are highly automated with self-service access, they reduce the number of manual system management tasks, thereby eliminating the burden on IT administrators.
Next, Dr. Mujamder’s description of the many challenges associated with finding a way to automatically identify which structured content to archive is spot on. Fortunately, the same challenges do not apply to unstructured content. In fact, organizations typically use metadata and other file properties to create efficient archiving policies that can routinely distinguish which unstructured content to migrate.
Unlike unstructured data, structured data does not provide effective metadata metrics for use in automatic content identification policies. Simply put, the applications creating the structured content or the databases storing that content do not support effective markers for policy creation. That being said, there are other ways to “bell the cat.”
FileTek has deployed successful active archive solutions for customers with structured data. Here is a brief description on how the process works. Transactional databases typically account for approximately 90% of structured content storage requirements. The majority of that structured data is the business event or the actual transaction (for example, a phone call, ATM transaction, customer purchase, or service). Because this data represents critical records needed for compliance, business requirements, and detailed marketing analysis, FileTek focused its attention on the best way to archive it.
When reviewing customer environments and specifications, FileTek discovered the following:
· Customer transaction databases had grown to an unmanageable size.
· The cost of maintaining these databases was outpacing IT budget requirements.
· Customers already planned to archive an overwhelming majority of transactional records at some point in time.
· Customers were also having problems with the backup process because cycles were running too long, and restore was a nightmare.
FileTek’s active archive solution actually turned out to be quite simple as it only required shifting perspective from what content to archive to when to archive that content. To that end, FileTek implemented a continuous archive process that essentially replicated all transactional records as they were created and written to primary storage. Then we loaded a copy of those transactional records into the active archive. Rules within the active archive created an additional copy of the data in a remote disaster recovery site, thereby automatically backing up transactional records in real-time with online access and no burdensome restore requirements. To provide tiering and manage the costs of primary storage for transactional records, FileTek then employed a simple deletion policy on the primary storage based on age (can vary by industry). This process significantly reduced the primary database to a manageable size, which then improved performance and decreased budget pressures.
With this approach, FileTek customers benefited from another important active archive feature – data assurance. StorHouse achieves data assurance by automatically monitoring data for corruption due to bit rot and other storage media issues, replacing corrupted data with clean data, and moving content to a stable storage location.
Archival- Data Identification, The Missing Link
Submitted by Dr. Kumud Majumder on Mon, 10/24/2011 - 08:56To archive or not to archive, is no more the question in today's business world. With a spate of financial and other high profile scandals, the federal regulators of various countries now require long term retention /archival of most business data. Even so, not all are quite on the archival front.

Let's start at the very beginning- a very good place to start. Structured data archival is a multi step process as shown above. The triangles represent different steps involved in the archival life cycle. The arrows show the process flow, the contact points between the triangles being a reference to the inter-relationship of different steps. The whole process starts with identifying what can be and needs archival, then we transform it into a forward compatible format and move the data to a suitable storage system to complete the archival process. While restoration of data is not part of archival per se, the fact that archived data needs easy restoring and readability is one of the "Duh" cases. Compliance and audit plays a dominant role in this. Of course one can argue that an active archive solution offers reliable, online and efficient access to the archived data. So, why bother to restore? Since the jury is still out on this, 'Restore' continues to be an integral part of data archival.
A survey of the available archival technologies shows that while IBM appears to lead the race with Optim integrated data management, the Active Archive Alliance is playing a key role in making the online data archive options available to a wider audience. To date, most of the focus has been on the "Transform," "Archive," "Restore" processes with very little means being available to identify what component of business data merits archival. The "Identify" step is largely a manual process, often a matter of subjective interpretation of business rules. To clarify, in a large global company determining what data is in active use at any given time and what can be moved to an offline archive system is highly challenging. Keeping a large volume of inactive data in the operational databases is extremely costly and yet often obligatory due to federal rules and regulations. Naturally it makes very good business sense to build a tool that classifies data into active and inactive thus making archival decisions simple. Technologically however, such data classification is a big challenge. While IBM WebSphere Content Discovery can assist in determining the relationship between different data sets, it does not help in finding out what data is stale or inactive. To my knowledge no commercially available tool exists in this space.
Why is it so hard to bell this cat? Can we not simply look at the metadata at the RDBMS level and peg what data are in use and what data are stale. The answer is both yes and no. The data elements in major RDBMS are stored at the DB block or page level. The metadata tracks read/write operations of the data elements at the storage unit level. So, yes tracking the metadata can yield very valuable information on whether a data set is stale or not. But, unfortunately, it's not as simple as it sounds. The read/write operations in RDBMS are not just limited to business queries, these are also triggered by maintenance operations like building/maintenance of indexes or even by a query that runs full table scan and so on.
So, how do we bell this cat? What do you have to say on this issue?
The Latest on Tape Capacity
Submitted by Rich Gadomski on Tue, 09/27/2011 - 10:55
I often get asked the question, “what does the future hold for data tape in terms of capacity?” Usually followed up with “Are we on the threshold of the super paramagnetic limit? Will we need some new technology or breakthroughs to keep up with the explosion of data? Are currently published roadmaps really achievable? Will tape continue to play a critical role in data protection and emerging applications such as active archiving?”
To get started answering these questions, let’s look at the midrange tape market where LTO clearly dominates all other technologies with close to a 90% share. LTO Generation 4 is currently the most popular format with a native capacity of 800 GB. But the latest generation introduced in 2010, LTO-5, has been growing rapidly in popularity with a native capacity of 1.5 TB. LTO-5 also features a major breakthrough known as LTFS, or Linear Tape File System, which allows for dual partitioning of the tape where a portion of the tape is dedicated to a file index to enhance file management and facilitate data exchange and long-term data retention.
Next in line for LTO is Generation 6, with a native capacity slated for 3.2 TB, followed by LTO-7 at 6.4 TB and LTO-8 with a pretty impressive native capacity of 12.8 TB. We expect to see LTO-6 in 2012 or early 2013 with the next generations every two to three years thereafter. But back to the questions, is this truly achievable?
The answer is a resounding “yes!” and we can look to the enterprise tape market for a good indication of tape’s future direction.
In 2006, Fujifiim and IBM demonstrated a world record in data density on linear magnetic tape of 6.67 billion bits per square inch. This meant the ability to achieve multi-terabyte capacities of up to 8.0 TB on a single tape cartridge. The tape sample used was based on Fujifilm’s NANOCUBIC technology incorporating a new Barium Ferrite (BaFe) magnetic particle with the ability to resist outer magnetic interference and to maintain a strong magnetic signal even at greatly reduced dimensions compared to commonly used metal particles.
Once again in 2010, Fujifilm and IBM announced a new world record in data density on linear magnetic tape, achieving 29.5 billion bits per square inch using the new Barium Ferrite magnetic particle, this time with a perpendicular orientation, an even smaller particle size and a more advanced coating and dispersion technology. This translates into the possibility of developing a single tape cartridge capable of holding a massive 35 TB of native data. That’s 23 times the capacity of today’s LTO-5 and far exceeds the LTO roadmap requirements for LTO-8!
Earlier this year, the first product based on this technology came to market in the form of Oracle’s enterprise T10000-C drive and cartridge with a native capacity of 5.0 TB. This product represents a 3X capacity increase compared to the latest LTO-5 product and clearly speaks to the reality of increasing capacity on tape.
With the explosion of tier 3 unstructured file data, regulatory compliance and the need to keep data for longer periods in active archive mode, expect to see further advances in tape capacities as tape continues to play a vital role in cost effective, reliable, long-term mass storage.
Not One Throat to Choke but One Hand to Help
Submitted by Mark Seamans on Thu, 09/22/2011 - 09:18
The entries in this Active Archive blog are filled with good ideas about how organizations benefit from the implementation of a secure and robust long term information archive. The concepts are intuitive - and the benefits seem obvious, but it also seems clear the growth of active archiving as a generally deployed data storage practice is going to take time. The question is "How much time?".
Some people argue that standardization of tape formats is an important ingredient in helping to drive more widespread use of active archiving. Others will argue that the dramatic reduction in the costs of storage devices will be the key. And either one of these elements may play a role in growing the overall market, but let's be honest about the fact that the data format that is used to house data on tape will represent about 1% of the factors involved in the success or failure of your active archive initiative. Interestingly, I think that the single largest factor that can accelerate the growth in use of active archiving is something you might not expect. It's "people" and their willingness and ability to lead the way to success.
I'll be honest that the growth of active archiving is undoubtedly occurring. At our company, we work with clients every day who are evolving their concept of how storage can work to meet their business requirements. For these people, the "light bulb goes on" in their heads when they realize that the decision to pursue active archiving is really a decision to put a living storage infrastructure in place that they can mold and adapt over time to meet their really long term storage needs in areas that include cost optimization, retrieval performance, retention, version management, multi-platform access - and active data validation and repair to deal with the reality that some of their media will experience failures over time.
As we work with clients who are considering active archive, a key question we ask is "how long do you need to keep the data"? More often than not the answer is "forever..." - and that's a long time. Human nature teaches us that before people move forward on things that will need to be in place "forever" - they typically want to be sure that they are making that right decision and getting aligned with the right people. It just makes sense.
We recently were partnered with an integrator and customer who expressed their desire for having a 'single throat to choke' in terms of support for their solution which was going to leverage active archive software to manage both tape and disk-based storage in a high-availability and high-capacity solution. While they used the term 'one throat to choke', it was clear that they really wanted a strong 'hand to help' as they designed, implemented and optimized their solution for the long haul. With active archiving, it's just as important to have the system working well on "Day 10,000" after deployment as it was to have it working well on "Day 1".
In the case mentioned here, we stepped up to offer our commitment to be the 'one stop shop' for ongoing support on the overall solution - but in other cases it's been the integrator who desires to be the long term partner and advisor in support of the system. The point is not really about 'who' took on the advisor and support role - but rather that it was a required piece of the puzzle for this customer to move forward.
The growth of active archiving overall relies on our ability as leading vendors and integrators to simply be "leaders". We need to be ready to step up and risk being the "one throat to choke" in order to seize the significant rewards that will come from helping thousands and thousands of organizations transform the way they achieve secure long term storage of their critical electronic data.
Great products and data formats alone won’t be enough – it will all come down to the people.
