Alexandria Digital Research Library

Collocated data deduplication for virtual machine backup in the cloud

Author:
Zhang, Wei
Degree Grantor:
University of California, Santa Barbara. Computer Science
Degree Supervisor:
Tao Yang
Place of Publication:
[Santa Barbara, Calif.]
Publisher:
University of California, Santa Barbara
Creation Date:
2014
Issued Date:
2014
Topics:
Computer Science
Keywords:
Backup and archival storage
Converged architecture
Cloud computing
Distributed storage
Virtual machine snapshot
Deduplication
Genres:
Dissertations, Academic and Online resources
Dissertation:
Ph.D.--University of California, Santa Barbara, 2014
Description:

Cloud platforms that host a large number of virtual machines (VMs) have high storage demand for frequent backups of VM snapshots. Content signature based deduplication is necessary to eliminate excessive redundant blocks. While dedicated backup storage systems can be used to reduce data redundancy, such an architecture is expensive and introduces huge network traffic in a large cluster. This thesis research is focused on a low-cost backup and deduplication service collocated with other cloud services to reduce infrastructure and network cost.

The previous research for cluster-based data deduplication has concentrated on various inline solutions. The first part of the thesis work is a highly parallel batched solution with synchronized backup scalable for a large number of virtual machines. The key idea is to separate duplicate detection from the actual storage backup, and to partition global index and detection requests among machines using fingerprint values. Then each machine conducts duplicate detection partition by partition independently with minimal memory consumption. Another optimization is to allocate and control buffer space for exchanging detection requests and duplicate summaries among machines. The resource requirement in terms of memory and disk usage for the proposed solution is very small while the backup efficiency in terms of overall throughput and time is not compromised. Our evaluation validates this and shows a satisfactory backup throughput in a large cloud setting.

The second part of the thesis work is a VM-centric collocated backup service with inline deduplication. The key difference compared to the previous work is its novelty in fault resilience and low resource usage. We propose a multi-level selective deduplication scheme which integrates similarity-guided and popularity-guided duplicate elimination under a stringent resource requirement. This scheme uses popular common data to facilitate fingerprint comparison, localizes deduplication as much as possible within each VM, and associates underlying file blocks with one VM for most of cases. The main advantage of this scheme is that it strikes a balance between inner and inter VM deduplication, increasing parallelism and improving reliability. Our analysis shows that this VM-centric scheme can provide better fault tolerance while using a small amount of computing and storage resource. We have conducted a comparative evaluation of this scheme on its competitiveness in terms of deduplication efficiency and backup throughput.

Physical Description:
1 online resource (128 pages)
Format:
Text
Collection(s):
UCSB electronic theses and dissertations
ARK:
ark:/48907/f30k26q9
ISBN:
9781321350364
Catalog System Number:
990045117850203776
Rights:
Inc.icon only.dark In Copyright
Copyright Holder:
Wei Zhang
File Description
Access: Public access
Zhang_ucsb_0035D_12307.pdf pdf (Portable Document Format)