Blurring the Lines

 

Recently there have been a number of product announcements from backup vendors stating the virtues of using backup applications for archiving. One of the main reasons the Active Archive Alliance was formed was to better educate organizations about archiving and to explain why an Active Archive is superior to backup for archiving.

Let’s look at the ways an Active Archive secures data for medium- to long-term retention periods, compared to using a backup product for archiving. 

An Active Archive is a self-supporting system that securely stores and maintains data independently of primary storage and backup.  Active Archives do not replace backup, the solutions are complementary, working together to manage data.

Backup is designed to allow restoration of data to primary storage and servers in the event of an outage, data loss or user deletion.  A backup can be a full or an incremental backup (with periodical full backups). Either way, considerable numbers of files are copied (backed-up) during each operation, so speed is of the essence. One way backup companies improve write speed is to remove data verification during writes. This is acceptable in a backup operation, as it is always possible to look at an alternative backup for the data, for example the previous day’s backup. The intervals between backup copies are determined by the organization’s tolerance for data loss, whether minutes, hours, or days.

In an archive environment, files are moved from primary storage to the archive, optionally leaving a stub behind. Before the file is removed from primary storage it is vital that the validity of the file, now only in the archive, be checked, therefore all data entering the archive is verified during the write operation.

If an organization is using tape for their backup they will use a tape rotation scheme (“Grandfather, Father, Son (GFS)”, “Tower of Hanoi” etc).  In a GFS rotation scheme, with a daily backup including weekends, at least 22 tape sets are needed in a year.  With Tower of Hanoi it can be 10 sets.  In both examples, this excludes the replacements of worn out media.

An Active Archive will typically have a maximum of two or three copies of the data on tape. This offers considerable savings in media costs and reduces the total amount of managed data, while maintaining the integrity of the archive. 

An Active Archive builds in future proofing by helping organizations avoid costs associated with vendor lock-in and deliver a flexible, worry free migration strategy to new storage and/or new platforms.

When using backup to archive, a proprietary stub is left on primary storage, or dedicated disk archive, pointing to a location in the backup. In QStar’s opinion, a proprietary stub will, in the future, lead to huge clean-up operations resulting in unwanted costs and disruption, when the organization wishes to switch providers. The only way to migrate data is to bring everything back to primary storage before writing the archived data out again to the new archive.

An Active Archive allows for migration of archived data over time, for example, migrating data from older LTO2 media to LTO5. All Active Archive applications support this. Data can be migrated from one technology to another without bringing it back to primary storage. Disk storage systems have an average life of 3 to 5 years, while tape libraries have an average life of 6 to 10 years, allowing multiple tape generations to be used over the lifetime of the library. Archived data can often have a desired retention time of decades, so the ability to migrate data, in those environments, must be planned for.

Finally, more and more Active Archive applications can optionally use the new Industry Standard Linear Tape File System (LTFS). With LTFS, media can be read by other archiving software, allowing media reuse, migration of data to new solutions on new platforms.

Backup is a necessary evil, securing constantly changing and often accessed files on primary storage. However, backup methods also create a significant amount of redundant data. Once data on primary storage becomes older and less frequently accessed, moving it into an Active Archive tier is the best approach to reduce cost and secure data for the medium- to long-term, removing it from the repetitive backup process.