Managing Data Retention Policies at Scale

Li, Jun; Singhal, Sharad; Swaminathan, Ram; Karp, Alan H.
HP Laboratories


Keyword(s): large-scale policy management; compliance and regulatory; data retention; encryption key store; cloud service

Abstract: Compliance with regulatory policies on data remains a key hurdle to cloud computing. Policies such as EU privacy, HIPAA, and PCI-DSS place requirements on data availability, integrity, migration, retention, and access, among many others. This paper proposes a policy management service that offers scalable management of data retention policies attached to data objects stored in a cloud environment. The management service includes a highly available and secure encryption key store to manage the encryption keys of data objects. By deleting the encryption key at a specified retention time associated with the data object, we effectively delete the data object and its copies stored in online and offline environments. To achieve scalability, our service uses Hadoop MapReduce to perform parallel management tasks, such as data encryption and decryption, key distribution and retention policy enforcement. A prototype deployed in a 16-machine Linux cluster currently supports 56 MB/sec for encryption, 76 MB/sec for decryption, 31, 000 retention policies/sec read and 15,000 retention policies/sec write.

Additional Publication Information: To be published in IFIP/IEEE International Symposium on Integrated Network Management 2011, Dublin, Ireland, May 23-27, 2011.

External Posting Date: December 21, 2010 [Fulltext]. Approved for External Publication
Internal Posting Date: December 21, 2010 [Fulltext]

