Detailed Introduction About Deduplication in Commvault Backup | Detailed Overview On Deduplication in Commvault Backup

Deduplication Overview

                        Deduplication provides an efficient method to transmit and store data by identifying and eliminating duplicate blocks of data during backups.

All data types from Windows, Linux, UNIX operating systems and multiple platforms can be deduplicated when data is copied to secondary storage. 

Deduplication offers the following benefits:
·        Optimizes use of storage media by eliminating duplicate blocks of data.
·        Reduces network traffic by sending only unique data during backup operations.

Deduplicate reduces the amount of data

How Deduplication Works

The following is the general workflow for deduplication:

·        Generating signatures for data blocks

A block of data is read from the source and a unique signature for the block of data is generated by using a hash algorithm.
Data blocks can be compressed (default), encrypted (optional), or both. Data block compression, signature generation, and encryption are performed in that order on the source or destination host.

·        Comparing signatures

The new signature is compared against a database of existing signatures for previously backed up data blocks on the destination storage. The database that contains the signatures is called the Deduplication Database (DDB).

o   If the signature exists, the DDB records that an existing data block is used again on the destination storage. The associated MediaAgent writes the index information to the DDB on the destination storage, and the duplicate data block is discarded.

o   If the signature does not exist, the new signature is added to the DDB. The associated MediaAgent writes both the index information and the data block to the destination storage.

Signature comparison is done on a MediaAgent. For improved performance, you can use a locally cached set of signatures on the source host for the comparison. If a signature does not exist in the local cache set, it is sent to the MediaAgent for comparison.

·        Using MediaAgent roles

During the deduplication process, two different MediaAgents roles are used. These roles can be hosted by the same MediaAgent or different MediaAgents

o   Data Mover Role: The MediaAgent has write access to disk libraries where the data blocks are stored.

o   Deduplication Database Role: The MediaAgent has access to the DDB that stores the data block signatures.

o   An object (file, message, document, and so on) written to the destination storage may contain one or many data blocks. These blocks might be distributed on the destination storage whose location is tracked by the MediaAgent index. This index allows the blocks to be reassembled so that the object can be restored or copied to other locations. The DDB is not implemented during the restore process. 

Strategies for Deduplication Implementation

Source-Side (Client-Side) Deduplication

(Recommended). Use source-side deduplication when the MediaAgent and the clients are in a delayed or low bandwidth network environment such as WAN. You can also use source-side deduplication for Remote Office backup solutions.
MediaAgent-Side (Storage-Side) Deduplication

Use MediaAgent-side deduplication when the MediaAgent and the clients are in a fast network environment such as LAN and if you do not want any CPU utilization on client computers.
When the signature generation option is enabled on the MediaAgent, MediaAgent-side deduplication reduces the CPU usage on the client computers by moving the processing to the MediaAgent.
Global Deduplication

Global deduplication provides greater flexibility in defining retention policies when protecting the data.
Use global deduplication storage policies in the following scenarios:
·        To consolidate Remote Office backup data in one location.
·        When you must manage data types, such as file system data and virtual machine data, by different storage policies but in the same disk library.

Deduplication to Tape (Silo Storage)

Deduplication to Tape can copy deduplicated data to tape in a deduplicated format.
Deduplication to Tape extends the primary disk storage by managing the disk space and periodically moving the deduplicated data to the secondary storage.
Deduplicated data on tape responds automatically to restore requests by copying only necessary data back to the disk library and then restoring the data.

An Auxiliary Copy job uses DASH (Deduplication Accelerate Streaming Hash) Copy, which is an option for a deduplication-enabled storage policy copy, to send only unique data to that copy. DASH Copy uses network bandwidth efficiently and minimizes the use of storage resources.
DASH Copy transmits only unique data blocks, which reduces the volume and time of an Auxiliary Copy job by up to 90%.
Use DASH Copy when remote secondary copies can only be reachable on low bandwidth connections.
DASH Full (Accelerated Synthetic Full Backups)

DASH Full is a Synthetic Full operation that updates the DDB and index files for existing data rather than physically copying data like a normal Synthetic Full backup.
Use DASH Full backup operations to increase performance and reduce network usage for full backups.
DASH Full is used with Simpana OnePass to manage the retention of archived data.

About CommVault Architecture | Detailed Overview of CommCell Management Group

CommCell Management Group Overview


A CommCell management group is the logical grouping of all Simpana software components that protect, move, store, and manage the movement of data and information. A CommCell group contains one CommServe server, one or more MediaAgents, and one or more Clients.
CommCell Management Group


The CommServe server is the central management component of the CommCell group. It coordinates and executes all CommCell group operations, maintaining Microsoft SQL Server databases that contain all configuration, security, and operational history for the CommCell group. There can be only one CommServe server in a CommCell group. The CommServe software can be installed in physical, virtual, and clustered environments. 


The MediaAgent is the data transmission manager in the CommCell group. It provides high-performance data movement and manages the data storage libraries. The CommServe server coordinates MediaAgent tasks. For scalability, there can be more than one MediaAgent in a CommCell. The MediaAgent software can be installed in physical, virtual, and clustered environments. 

A client is a logical grouping of the Simpana software agents that facilitate the protection, management, and movement of data associated with the client. 

An agent is a Simpana software module that is installed on a client computer to protect a specific type of data. Different agent software is available to manage different data types on a client, such as Windows file system data, Oracle databases, etc. Agent software can be installed in physical, virtual, and clustered environments, and may be installed either on the computer or on a proxy server.
CommCell Console 

The CommCell Console is the central management user interface for managing the CommCell group - monitoring and controlling active jobs, and viewing events related to all activities. The CommCell Console allows centralized and decentralized organizations to manage all data movement activities through a single, common interface.

Commcell Logical Architecture

A CommCell management group consists of one CommServe server and any number of MediaAgents and Clients. There is also a logical architecture to a CommCell, which can be defined in two main areas
Production data being used by servers and computers in the enterprise, and 
Protected data which has been backed up, archive, or replicated to storage media.
A computer for which Simpana agents are protecting data.

A Simpana software component that is installed to protect a specific type of data on a client, e.g., Windows File System, Oracle databases, etc.

A logical container that identifies and manages specific production data (drives, folders, databases, mailboxes) to be protected.

Backup Set
One or more logical groupings of subclients, which are the containers of all of data managed by the agent. For some agents, this might be called an archive set or replication set. For a database agent, the equivalent of a backup set is generally a database instance.

Storage Policy
A logical data management entity with rules that define the lifecycle management of the protected data in a subclient's content.

Logical Management of Production Data

Production data is managed using agents, which interface natively with the file system or application and can be configured based on the specific functionality of data being protected. Data within these agents is grouped into backup sets. Within the backup set, one or more subclients can be used to map to specific data.
Logical Management of Production Data

Logical Management of Protected Data

Simpana’s concept of data management utilizes a data protection strategy based on logical policies. The flexibility gained from policy-driven data protection and management is the ability to group data based on protection and retention needs, rather than by the physical location of the data, which greatly simplifies the organization and management of protected data.
In a CommCell group, subclients define the actual data that will be protected. Subclients can contain an entire server, drive, folders, database, user mailboxes, virtual machines, or even document repositories. The data defined in these subclients is protected through backup, archive, or snapshot operations into Simpana protected storage. Once in protected storage, the data from these subclients can be independently managed regardless of what production server they came from.
A storage policy manages subclient data based on business requirements, even when the subclients' content resides on different servers in the CommCell. It defines a specific set of rules to manage the associated data; which data will be protected (which subclients); where it will reside (the data path and library); how long it will be kept (retention settings); and other media management options such as deduplication, compression, and encryption of the data in protected storage. The first storage policy defines the primary copy of the backed up data, which can be stored on local libraries for quick access. Additional copies of the backed data can be automatically created from existing copies already in the protected storage environment to other libraries and locations for consolidation, auditing, business continuity.
  • A project may have different types of data that reside on numerous servers and storage devices. Agents for each type of data are installed, and subclients are defined to access the data in all of its locations. All of these subclients can be associated with a single storage policy to manage the business related data as a single entity.
  • Financial and legal data from different servers or location can be combined into a storage policy for compliance reasons. Databases can be managed in a storage policy and sent to a disaster recovery location. User files can be kept in an on-site copy for quick file recovery.

Detailed Introduction to Commvault Simpana Software | Features of Simpana Software

Simpana Software


Commvault Simpana Software

                     The Simpana software platform is an enterprise level, which is integrated with data and information management solution, built from the ground up on a single platform and unified code base. All functions share the same back-end technologies to deliver the unparalleled advantages and benefits to protecting, managing and accessing data. 

The Simpana software platform contains modules to protect and archive, analyze, replicate, and search your data, which all share a common set of back-end services and advanced capabilities, seamlessly interacting with one another. The Simpana software platform addresses all aspects of data management in the enterprise, while providing infinite scalability and unmatched control of data and information.


Production data is protected by installing agent software on the physical or virtual hosts which use operating system or application native APIs to properly protect data in a consistent state. Production data is processed by the agent software on client computers and backed up through a data manager, the MediaAgent, to disk, tape, or cloud storage. All data management activity in the environment is tracked by a centralized server, the CommServe, and can be managed by administrators through a central user interface.

Key features of the Simpana software platform

  • Complete data protection solution supporting all major operating systems, applications, and databases on virtual and physical servers, NAS shares, cloud-based infrastructures, and mobile devices.
  • Simplified management through a single console; view, manage, and access all functions and all data and information across the enterprise.
  • Multiple protection methods including backup and archive, snapshot management, replication, and content indexing for eDiscovery.
  • Efficient storage management using deduplication for disk and tape.
  • Integrated with the industry's top storage arrays to automate the creation of indexed, application-aware hardware snapshot copies across multi-vendor storage environments.
  • Complete virtual infrastructure management supporting both VMware and Hyper-V.
  • Advanced security capabilities to limit access to critical data, provide granular management capabilities, and provide single sign on access for Active Directory users.
  • Policy-based data management, transcending limitations of legacy backup products by managing data based on business needs and not the physical location.
  • Cutting edge end-user experience empowering them to protect, find and recover their own data using common tools such as web browsers, Microsoft Outlook and File Explorer.
Features of Simphana Software Capabilities

Backup and Recovery

Simpana software provides smooth and efficient backup and restore of data and information in your enterprise from most mainstream operating systems, databases, and applications. The backup and recovery system uses agents to interface with file systems and applications to facilitate the transfer of data from production systems to the protected environment. Data protection is available in three areas:
·         File Systems
·         Applications
·         Databases

OnePass and Archive

Using data archiving, you can retain, store, classify, and access information according to its business, compliance value with one method of access and preservation across all ESI (Electronically Stored Information).
·         Simpana OnePass is the industry's first converged process for backup, archive, and reporting. It incorporates both the backup and archive processes in a single, low-impact data collection operation, moving data to secondary storage where the data functions as both a backup and archive copy.
·         Database archive agents securely archive inactive data to both an archiving database and backup media, and providing smooth access to the archived data from the production database.
·         Classic archive agents move infrequently used mailbox items from primary to secondary storage to optimize storage space, and provide lower cost long term data retention.

Virtual Machine Integration

Virtualization demands a data management solution that is aware of dynamic workloads, consolidated resources, and cloud-based computing models. Simpana software is built with this in mind to let you virtualize even the most demanding applications, leveraging deep integration into the virtual infrastructure to deliver advanced data management capabilities and automate the protection of VMs.  Simpana software protects all of your VMs quickly and unifies the data protection of physical and virtual environments.

Snapshot Management

IntelliSnap technology integrates the complex lifecycle of snapshot management seamlessly into the Simpana software framework. This integrated approach makes it quicker, easier, and more affordable to harness the power of multiple vendor array-based snapshots, accelerating backup and recovery of applications, systems, virtual machines, and data. IntelliSnap automates the creation of application-aware hardware snapshot copies across a multi-vendor storage environment, and catalogs snapshot data to simplify the recovery of individual files without the need for a collection of scripts and disparate snapshot, backup, and recovery tools.

Edge Endpoint Solutions 

Edge Endpoint Solutions offer data protection, security, access from anywhere, and search capabilities for end users, to protect against data breaches and increase productivity while providing self-service capabilities. End users have immediate access to their files, regardless of where they create them, and can securely share, search, and restore files using their own mobile, desktop and laptop devices, without assistance.
Security and Encryption

Simpana software securely protects data and information - whether it's on premises, at the edge, or in the cloud. Security is baked in to the platform to secure data on desktops or laptops, in the office or on the road, at rest or in flight, utilizing efficient encryption, granular and customizable access controls for content and operations, role-based security, single sign-on, alerting, and audit trails to keep your information secure. Protected data is efficiently stored in the Simpana ContentStore—the virtual repository of all Simpana software-managed information. Simpana security will reduce privacy breaches and exposure events, and reduce costs by efficiently securing stored data.


The deduplication integrated into Simpana software reduces backup times while saving on storage and network resources by identifying and eliminating duplicate blocks of data during backups. All data types from Windows, Linux, and UNIX operating systems are deduplicated before moving the data to secondary storage, reducing the time and bandwidth required to move data by up to 90 percent, reducing the space required for storage, and reducing the time required to restore data. Deploy deduplication where it makes the most sense: at the source, on the target, or both. 

Reporting and Insight

Access to actionable information is critical to informed decision-making and operational excellence. Simpana software has robust, built-in reporting analytics, with deep operational reporting integrated with data management operations, eliminating the need for third-party reporting tools. Global, web-based reporting provides a rich understanding of operations with deep views into data, usage and environmental characteristics, business intelligence for infrastructure cost planning, and simplified compliance audits. Live instrumentation and dashboard views provide summary and analytic views of utilization, success rates, and a host of other parameters designed to simplify data management, while historical operations data is available for regular status reporting, trend analysis and best practice comparison to achieve operational excellence. And, yes, there's an app for that! Install the CommVault Monitor app on your smart device to view reports and event details, and even monitor and manage jobs in the CommCell.


Discover meaningful insights and unleash your data's full potential with our analytics software. Analyze your data to gain insight to the underlying processes and meaningful patterns to gain a business advantage. Simpana software provides data analytics to view statistical information about your data, web analytics to improve the usability and the content of a website or application, and data connectors to collect the information residing in various data repositories throughout your enterprise.

CommVault Backup Appliance 

The CommVault Appliance A600 combines Simpana software with NetApp’s simple, fast, scalable E-Series Storage System to address key challenges around scalability, flexibility, and manageability. With simple configuration and management designed to meet enterprise data protection requirements, you can go from power-up to backup in less than an hour.
Replicate data from a source computer to a destination computer in near real-time by logging all file write activity to a replication log in the source computer, transferring it to the destination computer, and replaying it, thus ensuring that the destination remains a nearly real-time replica of the source.

Disaster Recovery

The potential impact of application outages and data loss can be staggering. That is why disaster recovery is among the most vital of operations for every enterprise that requires the agility and availability to handle unplanned outages caused by anything from intrusions to unforeseen natural events. Simpana software dramatically simplifies business continuity and disaster recovery operations with an integrated, flexible, and efficient platform that improves business resiliency while reducing costs and the risks of data loss.

Cloud Services

The Software Store is an online store available in the Cloud Services site where you can download the software installer, service packs, hot fixes, reports, limited distribution tools, and workflows. In addition, several Simpana software tools are hosted on our cloud site.

Detailed Introduction to Commvault Backup | About Commvault Backup Technology

Commvault Backup Technology


                   Commvault is a publicly traded data protection and information management software company headquartered in Tinton Falls, New Jersey. It was formed in 1988 as a development group in Bell Labs, and later became a business unit of AT&T Network Systems. It was incorporated in 1996.
Commvault software assists organizations with data backup and recovery, cloud and infrastructure management, and retention and compliance.


Commvault software is an enterprise-level data platform that contains modules to backup, restore, archive, replicate, and search data. It is built from the ground-up on a single platform and unified code base.
Data is protected by installing agent software on the physical or virtual hosts, which use operating system or application native APIs to protect data in a consistent state. Production data is processed by the agent software on client computers and backed up through a data manager, the MediaAgent, to disk, tape, or cloud storage. All data management activity in the environment is tracked by a centralized server, the CommServe, and can be managed by administrators through a central user interface. End users can access protected data using web browsers and mobile devices.

Key features of the software platform include

  • Simplified management through a single console: view, manage, and access all functions and all data and information across the enterprise.
  • Multiple protection methods including backup and archive, snapshot management, replication, and content indexing for eDiscovery.
  • Efficient storage management using deduplication for disk and tape.
  • Integrated with the industry's top storage arrays to automate the creation of indexed, application-aware hardware snapshot copies across multi-vendor storage environments.
  • Complete virtual infrastructure management that supports many hypervisors, including VMware and Hyper-V.
  • Advanced security capabilities to limit access to critical data, to provide granular management capabilities, and to provide single sign-on access for Active Directory users.
  • Policy-based data management, which transcends the limitations of legacy backup products by managing data based on business needs and not on physical location.
  • Self-service for end users, which allows them to protect, find, and recover their own data using common tools such as web browsers, mobile devices, Microsoft Outlook, and File Explorer.

If we have to understand the Commvault Backup tool, we have to know about these things first

Simpana Software 

Backup and Recovery

OnePass™ and Archive

Virtual Machine Integration

Snapshot Management

Edge Endpoint Solutions

Security and Encryption


Reporting and Insight


CommVault Backup Appliance


Disaster Recovery

Cloud Services

To know more about the features of CommVault Simpana Software, click the link

About Microsoft Azure Data Centers and It's Services

Azure Data Centers and Services

Azure services are hosted in physical Microsoft-managed data centers throughout the world. The data centers are located in multiple geographic areas, with a pair of regional data centers in each geographic region.

Azure Data Centers

Azure Services: Compute, Storage, and Identity

Microsoft Azure provides cloud services for accomplishing various tasks and functions across the IT spectrum and those services can be organized in several broad categories. There are services for different usage scenarios and a wide range of services that can be used as building blocks to create custom cloud solutions.

Compute and Networking Services

Azure Compute & Networking Services

Azure Virtual Machines - Create Windows® and Linux virtual machines from pre-defined templates, or deploy your own custom server images in the cloud.

Azure RemoteApp - Provision Windows applications on Azure and run them from any device.

Azure Cloud Services - Define multi-tier PaaS cloud services that you can deploy and manage on Windows Azure.

Azure Virtual Networks - Provision networks to connect your virtual machines, PaaS cloud services, and on-premises infrastructure.

Azure ExpressRoute - Create a dedicated high-speed connection from your on-premises data center to Azure.

Traffic Manager - Implement load-balancing for high scalability and availability.

Storage and Backup Services

Azure Storage - Store data in files, binary large objects (BLOBs), tables, and queues.

Azure Import/Export Service - Transfer large volumes of data using physical media.

Azure Backup - Use Azure as a backup destination for your on-premises servers.

Azure Site Recovery - Manage complete site failover for on-premises and Azure private cloud infrastructures.

Identify and Access Management Services

Azure Active Directory -Integrate your corporate directory with cloud services for a single sign on (SSO) solution.

Azure Multi-Factor Authentication - Implement additional security measures in your applications to verify user identity.

Azure Services: Web, Data, and Media

Azure Web, Data and Media Services

Web and Mobile Services

Azure Websites - Create scalable websites and services without the need to manage the underlying web server configuration.

Mobile Services - Implement a hosted back-end service for mobile applications that run on multiple mobile platforms.

API Management - Publish your service APIs securely.

Notification Hubs - Build highly-scalable push-notification solutions.

Event Hubs - Build solutions that consume and process high volumes of events.

Data and Analytics Services

SQL Database - Implement relational databases for your applications without the need to provision and manage a database server.

HDInsight®- Use Apache Hadoop to perform big data processing and analysis.

Azure Redis Cache - Implement high-performance caching solutions for your applications.

Azure Machine Learning - Apply statistical models to your data and perform predictive analytics.

DocumentDB - Implement a NoSQL data store for your applications.

Azure Search - Provide a fully managed search service.

Media and Content Delivery Services

Azure Media Services - Deliver multimedia content such as video and audio

Azure CDN -Distribute content to users throughout the world.

Azure BizTalk Services -Build integrated business orchestration solutions that integrate enterprise applications with cloud services.

Azure Service Bus - Connect applications across on-premises and cloud environments.

Grouping and Colocating Services

Grouping and Colocating Services

Grouping Related Services

When provisioning Azure services, you can group related services that exist in multiple regions to more easily manage those services. Resource groups are logical groups and can therefore span multiple regions.

Colocating Services by Using Regions

Although resource groups provide a logical grouping of services, they do not reflect the geographical location of the data centers in which those services are deployed. You can specify the region in which you want to host those services. This is known as colocating the services and it is a best practice to colocate interdependent Azure services in the same region. In some cases, Azure will actually enforce the colocation of services where a resource in that same region would be required.