How trespassing works using ALUA on a VNX Storage Array | Failover Modes

VNX LUN Trespassing


Brief description: 


LUNs on a storage system are allocated to a server. A storage admin creates a LUN on a RAID Group or Storage Pool and assigns it to a server. The platform team discovers this LUN, formats it, mounts it or assigns a drive letter and starts to use it. One important aspect is LUN ownership: which storage processor will process the I/O for that specific LUN ?

The newly created LUN will access through the default SP owner. We can change the ownership from one SP to another. This is processing is known as Trespassing.

Failover 


A procedure by which a system automatically transfers control to a duplicate system when it detects a fault or failure.

Failover modes are 4 types

Failover Mode 0 – LUN Based Trespass Mode

This failover mode is the default and works in conjunction with the Auto-trespass feature. Auto-trespass is a mode of operation that is set on a LUN by LUN basis.  If Auto-Trespass is enabled on the LUN, the non-owning SP will report that the LUN exists and is available for access.  The LUN will trespass to the SP where the I/O request is sent. Every time the LUN is trespassed a Unit Attention message is recorded.  If Auto-trespass is disabled, the non-owning SP will report that the LUN exists but it is not available for access.

Failover Mode 1 – Passive Not Ready Mode

In this mode of operation the non-owning SP will report that all non-owned LUNs exist and are available for access.  Any I/O request that is made to the non-owning SP will be rejected. 

Failover Mode 2 – DMP Mode

In this mode of operation the non-owning SP will report that all non-owned LUNs exist and are available for access. This is similar to Failover Mode 0 with Auto-trespass Enabled. Any I/O requests made to the non-owning SP will cause the LUN to be trespassed to the SP that is receiving the request.

Failover Mode 3 – Passive Always Ready Mode

In this mode of operation the non-owning SP will report that all non-owned LUNs exist and are available for access.  Any I/O requests sent to the Non-owning SP will be rejected.  This is similar to Failover Mode 1. However, any Test Unit Ready command sent from the server will return with a success message, even to the non-owning SP. 


How trespassing works using ALUA (Failover mode 4) on a VNX/CLARiiON storage system?

Resolution:

Since FLARE 26, Asymmetric Active/Active has provided a new way for CLARiiON arrays to present LUNs to hosts, eliminating the need for hosts to deal with the LUN ownership model. Prior to FLARE 26, all CLARiiON arrays used the standard active/passive presentation feature which one SP "owns" the LUN and all I/O to that LUN is sent only to that SP. If all paths to that SP fail, the ownership of the LUN was 'trespassed' to the other SP and the host-based path management software adjusted the I/O path accordingly.
Asymmetric Active/Active introduces a new initiator Failover Mode (Failover mode 4) where initiators are permitted to send I/O to a LUN regardless of which SP actually owns the LUN.

Manual trespass:

When a manual trespass is issued (using Navisphere Manager or CLI) to a LUN on a SP that is accessed by a host with Failover Mode 1, subsequent I/O for that LUN is rejected over the SP on which the manual trespass was issued. The failover software redirects I/O to the SP that owns the LUN.  

A manual trespass operation causes the ownership of a given LUN owned by a given SP to change. If this LUN is accessed by an ALUA host (Failover Mode is set to 4), and I/O is sent to the SP that does not currently own the LUN, this would cause I/O redirection. In such a situation, the array based on how many I/Os (threshold of 64000 +/- I/Os) a LUN processes on each SP will change the ownership of the LUN.

Path, HBA, switch failure:

If a host is configured with Failover Mode 1 and all the paths to the SP that owns a LUN fail, the LUN is  trespassed to the other SP by the host’s failover software.

With Failover Mode 4, in the case of a path, HBA, or switch failure, when I/O routes to the non-owning SP, the LUN may not trespass immediately (depending on the failover software on the host). If the LUN is not trespassed to the owning SP, FLARE will trespass the LUN to the SP that receives the most I/O requests to that LUN. This is accomplished by the array keeping track of how many I/Os a LUN processes on each SP. If the non-optimized SP processes 64,000 or more I/Os than the optimal SP, the array will change the ownership to the non-optimal SP, making it optimal.   

SP failure:

In case of an SP failure for a host configured as Failover Mode 1, the failover software trespasses the LUN to the surviving SP.

With Failover Mode 4, if an I/O arrives from an ALUA initiator on the surviving SP (non-optimal), FLARE initiates an internal trespass operation. This operation changes ownership of the target LUN to the surviving SP since its peer SP is dead. Hence, the host (failover software) must have access to the secondary SP so that it can issue an I/O under these circumstances.  

Single backend failure:

Before FLARE Release 26, if the failover software was misconfigured (for example, a single attach configuration), a single back-end failure (for example, an LCC or BCC failure) would generate an I/O error since the failover software would not be able to try the alternate path to the other SP with a stable backend.

With release 26 of FLARE, regardless of the Failover Mode for a given host, when the SP that owns the LUN cannot access that LUN due to a back-end failure, I/O is redirected through the other SP by the lower redirector. In this situation, the LUN is trespassed by FLARE to the SP that can access the LUN. After the failure is corrected, the LUN is trespassed back to the SP that previously owned the LUN.  See the “Enabler for masking back-end failures” section for more information.   

Note: Information in this solution is taken from the White Paper "EMC CLARiiON. Asymmetric Active/Active Feature"

                        For more information refer to Primus” emc202744 “.


Share this

Related Posts

Previous
Next Post »