Deduplication Overview
All data types from Windows, Linux, UNIX
operating systems and multiple platforms can be deduplicated when data is
copied to secondary storage.
Deduplication offers the following benefits:
·
Optimizes use of
storage media by eliminating duplicate blocks of data.
·
Reduces network
traffic by sending only unique data during backup operations.
Deduplicate reduces the amount of data |
How Deduplication Works
The following is the general workflow for
deduplication:
·
Generating signatures
for data blocks
A block of data is read from the source and a unique signature
for the block of data is generated by using a hash algorithm.
Data blocks can be compressed (default), encrypted (optional),
or both. Data block compression, signature generation, and encryption are
performed in that order on the source or destination host.
·
Comparing signatures
The new signature is compared against a database of existing
signatures for previously backed up data blocks on the destination storage. The
database that contains the signatures is called the Deduplication Database
(DDB).
o If the signature exists, the DDB records that an existing data
block is used again on the destination storage. The associated MediaAgent
writes the index information to the DDB on the destination storage, and the
duplicate data block is discarded.
o If the signature does not exist, the new signature is added to
the DDB. The associated MediaAgent writes both the index information and the
data block to the destination storage.
·
Using MediaAgent roles
During the deduplication process, two different MediaAgents
roles are used. These roles can be hosted by the same MediaAgent or different
MediaAgents
o Data Mover Role: The MediaAgent
has write access to disk libraries where the data blocks are stored.
o Deduplication Database Role: The
MediaAgent has access to the DDB that stores the data block signatures.
o An object (file, message, document, and so on) written to the
destination storage may contain one or many data blocks. These blocks might be
distributed on the destination storage whose location is tracked by the
MediaAgent index. This index allows the blocks to be reassembled so that the
object can be restored or copied to other locations. The DDB is not implemented
during the restore process.
Strategies for Deduplication Implementation
Source-Side (Client-Side) Deduplication
(Recommended). Use source-side deduplication
when the MediaAgent and the clients are in a delayed or low bandwidth network
environment such as WAN. You can also use source-side deduplication for Remote
Office backup solutions.
MediaAgent-Side (Storage-Side) Deduplication
Use MediaAgent-side deduplication when the
MediaAgent and the clients are in a fast network environment such as LAN and if
you do not want any CPU utilization on client computers.
When the signature generation option is
enabled on the MediaAgent, MediaAgent-side deduplication reduces the CPU usage
on the client computers by moving the processing to the MediaAgent.
Global Deduplication
Global deduplication provides greater
flexibility in defining retention policies when protecting the data.
Use global deduplication storage policies in
the following scenarios:
·
To consolidate Remote
Office backup data in one location.
·
When you must manage
data types, such as file system data and virtual machine data, by different
storage policies but in the same disk library.
Deduplication to Tape (Silo Storage)
Deduplication to Tape can copy deduplicated
data to tape in a deduplicated format.
Deduplication to Tape extends the primary disk
storage by managing the disk space and periodically moving the deduplicated
data to the secondary storage.
Deduplicated data on tape responds automatically
to restore requests by copying only necessary data back to the disk library and
then restoring the data.
DASH Copy
An Auxiliary Copy job uses DASH (Deduplication
Accelerate Streaming Hash) Copy, which is an option for a deduplication-enabled
storage policy copy, to send only unique data to that copy. DASH Copy uses
network bandwidth efficiently and minimizes the use of storage resources.
DASH Copy transmits only unique data blocks,
which reduces the volume and time of an Auxiliary Copy job by up to 90%.
Use DASH Copy when remote secondary copies can
only be reachable on low bandwidth connections.
DASH Full (Accelerated Synthetic Full Backups)
DASH Full is a Synthetic Full operation that
updates the DDB and index files for existing data rather than physically
copying data like a normal Synthetic Full backup.
Use DASH Full backup operations to increase
performance and reduce network usage for full backups.
DASH Full is used with Simpana OnePass to
manage the retention of archived data.
1 comments:
commentsGreat explanation of Deduplication. Thanks for sharing. Let us know if you are interested in submitting guest post on our blog.
Reply