What is the most appropriate way to test a virtual machine enabled for VMware fault tolerance?

What is the most appropriate way to test a virtual machine enabled for VMware fault tolerance?

A.
cause a failure on the ESXi host running the primary virtual machine

B.
cause a failure on an application within the primary virtual machine

C.
disable network connectivity to the primary virtual machine

D.
cause a failure on the primary virtual machine

Explanation:
Source: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020058

The following are proper testing scenarios with their expected outcomes:

Note: The following tests assume two hosts, Host A and Host B, with the primary fault tolerant virtual machine running on Host A and the secondary virtual machine running on Host B.
Select the Test Failover Function from the Fault Tolerance menu on the virtual machine.

This tests the Fault Tolerance functionally in a fully supported and non-invasive way. In this scenario, the virtual machine fails over from Host A to Host B, and a secondary virtual machine is started back up again. VMware HA failure does not occur in this case.

Host A complete failover

This scenario can be accomplished by pulling the host power cable, rebooting the host, or powering off the host from a remote KVM (such as iLO, DRAC, or RSA). The secondary virtual machine on Host B takes over immediately and continues to process information for the virtual machine. VMware HA failover occurs.

Virtual machine process on Host A fails

This scenario can be accomplished by terminating the active process for the virtual machine by logging into Host A. The secondary virtual machine takes over and no VMware HA failure occurs. VMware does not recommend testing in this way. For more information on terminating a virtual machine, see Powering off an unresponsive virtual machine on an ESX host (1004340)

13 Comments on “What is the most appropriate way to test a virtual machine enabled for VMware fault tolerance?

  1. Read A says:

    I agree with answer A for this current scenario, but Test King choose D answer which is I believe it is not the right one.
    Could some one have a strong discussion or explanation in which is the right one.

  2. Ryan M says:

    D is correct.
    Fault tolerance doesn’t aim to resovle issues with esxi host issues – it provides CPU lockstep between VM’s in the event a VM experiences problems.

    1. Daniel says:

      @Ryan
      Right..so it MUST be A. Cause if you create an fault IN the VM the fault gets replicated to the shadow VM. The same goes for B and C.

        1. Gert T says:

          @KEITH
          You perform this kill command on the ESXi host and NOT within the VM that is protected with FT, so you are causing a failure on the ESXi host, hence A is the correct answer.

          FT protects against host failure in a way that the VM keeps running on another ESXi host. If any failure occurs “on the primary VM” this would also be “played” onto the shadow VM like Daniel said.

  3. John says:

    From the following VMware KB

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020058

    Virtual machine process on Host A fails

    This scenario can be accomplished by terminating the active process for the virtual machine by logging into Host A. The secondary virtual machine takes over and no VMware HA failure occurs. VMware does not recommend testing in this way.

    So to repeat VMware does NOT recommend testing in this way so how can the answer be D?

  4. claudette says:

    It says in KB 1020058:
    Currently, Fault Tolerance failures are only triggered when there is no communication between the primary and secondary virtual machines.

    So why not just (C) cut the network?

    1. Don says:

      You can’t cut the network (C) for the VM because again that’s just simulating a failure on the vm.. you would have to pull the physical plugs on the host.

      Remember, FT is to provide failover functionality for applications that are not cluster aware.. which means the failover occurs at a hardware level – all testing must occur with that in mind. You need to cause a failure with the physical hardware, be it network wiring (host isolation) host failure etc etc.

      FT is neat but it’s brutal how restrictive it is with VM requirements, single cpu etc etc

  5. Cyrus says:

    You can cut the network, just not from the VM. Pulling the network cables out of the host is technically a host failure and thus answer A covers this particular scenario. I first I thought C was the answer but it can’t be as it is VM centric. FT, as numerous people have pointed out, is centic on a failure on a host, not the failure OF a host.

    Keep in mind as well that the practice is not to put the primary and secondary FT VM’s on the same host.

    Pulling the power out (should) trigger a HA response not an FT response – this scenario is in the training labs (HA specifically)


Leave a Reply

Your email address will not be published. Required fields are marked *