Another Way to Bell the Cat

My thanks to Dr. Mujamder for his input regarding our ongoing discussion around the topic of what constitutes an active archive solution and how that solution applies to many issues facing IT administrators today.  He brings up many good points, including the complexity and challenges of using metadata to automate a tiered storage process for structured content. I’d like to address some of his points from the perspective of a vendor member of the Active Archive Alliance.

First, Dr. Mujamder accurately describes the steps in a standard archival process as identify, transform, archive, and restore.  Moreover, he states that an active archive process differs from a standard archive process because it “offers reliable, online, and efficient access to data.”   I underscored the word “online” in Dr. Mujamder’s quote because it signifies a very important distinction that has a dramatic impact on how an active archive can be utilized in a much more efficient process than his diagram illustrates.  Think about it. If data remains “online,” there is no need to ever restore it.  In other words, an active archive approach can eliminate the transform and restore steps in a standard archive model simply by using a storage virtualization solution that provides online connectivity for applications and users.

Products such as FileTek’s StorHouse® platform effectively archive, retrieve, and manage unstructured data (stored in native file format) and structured data (stored in database format), thereby eliminating the time-consuming transform and restore steps for both content types. Because active archives are highly automated with self-service access, they reduce the number of manual system management tasks, thereby eliminating the burden on IT administrators.

Next, Dr. Mujamder’s description of the many challenges associated with finding a way to automatically identify which structured content to archive is spot on. Fortunately, the same challenges do not apply to unstructured content. In fact, organizations typically use metadata and other file properties to create efficient archiving policies that can routinely distinguish which unstructured content to migrate.

Unlike unstructured data, structured data does not provide effective metadata metrics for use in automatic content identification policies.  Simply put, the applications creating the structured content or the databases storing that content do not support effective markers for policy creation.  That being said, there are other ways to “bell the cat.”

FileTek has deployed successful active archive solutions for customers with structured data. Here is a brief description on how the process works.  Transactional databases typically account for approximately 90% of structured content storage requirements. The majority of that structured data is the business event or the actual transaction (for example, a phone call, ATM transaction, customer purchase, or service). Because this data represents critical records needed for compliance, business requirements, and detailed marketing analysis, FileTek focused its attention on the best way to archive it.  

When reviewing customer environments and specifications, FileTek discovered the following:

·       Customer transaction databases had grown to an unmanageable size.

·       The cost of maintaining these databases was outpacing IT budget requirements.

·       Customers already planned to archive an overwhelming majority of transactional records at some point in time.

·       Customers were also having problems with the backup process because cycles were running too long, and restore was a nightmare.

FileTek’s active archive solution actually turned out to be quite simple as it only required shifting perspective from what content to archive to when to archive that content. To that end, FileTek implemented a continuous archive process that essentially replicated all transactional records as they were created and written to primary storage. Then we loaded a copy of those transactional records into the active archive.  Rules within the active archive created an additional copy of the data in a remote disaster recovery site, thereby automatically backing up transactional records in real-time with online access and no burdensome restore requirements.  To provide tiering and manage the costs of primary storage for transactional records, FileTek then employed a simple deletion policy on the primary storage based on age (can vary by industry). This process significantly reduced the primary database to a manageable size, which then improved performance and decreased budget pressures.

With this approach, FileTek customers benefited from another important active archive feature – data assurance. StorHouse achieves data assurance by automatically monitoring data for corruption due to bit rot and other storage media issues, replacing corrupted data with clean data, and moving content to a stable storage location.