ConVirt Enterprise provides High Availability (HA) feature for virtual infrastructure which ensures seamless recovery from VM failures with minimal impact on the overall virtual infrastructure. When any Virtual Machine in the virtual infrastructure goes down, ConVirt detects it and migrates it to other available servers from the Server Pool.
ConVirt monitors the infrastructure and handles recovery at both Virtual Machine and Server level:
- Virtual Machine Failover on the same machine : If a VM on a server goes down, ConVirt first tries to start it again on the same server.
- Server Failover : If a physical server in a server pool goes down, all virtual machines on that server are migrated to another available server(s) in the pool.
- It is assumed that all servers in the server pool have a shared storage used by virtual machine.
- If the servers are part of traditional cluster, all servers in a cluster are also part of the servers pool and fencing is properly configured.
There are several configuration options ConVirt that can be used to customize the failover behavior of virtual infrastructure.
ConVirt High Availability configuration allows to:
- Enable High Availability
- Select between Virtual Machine Failover and Server Failover modes.
- Customize migration path for ConVirt managed Servers
- Configure dedicated standby servers for failover of selected virtual machines.
- Prioritize recovery of VMs in case of multiple virtual machine failures.
- Configuration fencing devices for each server.
- Customize wait interval and retry count for VM restart operation.
To configure your HA options, select a server pool in the navigator tree and select âConfigure High Availabilityâ from right click menu to open the HA configuration wizard. The dialog shows following options in the left navigation tree or you can use the arrows to traverse the wizard:
- Virtual Machine Priority
- Advanced Options
Selecting each will show corresponding configuration page on right hand side. Following sections describe each configuration option along with the relevant configuration page screen shots.
Enable High Availability and configure migration path
- Enable high availability : Once enabled, all virtual machines within this server pool will be highly available.
- Virtual Machine Failover on the same serevr : When this option is selected, it will detect Virtual Machine failures and start them on the same server.
- Virtual Machine and Server Failover : When this option is selected, both Virtual Machine and Server failures are handled. Here is an outline of the behavior when this option is selected.
- Whenever a virtual machine goes down, ConVirt will restart it on the same server.
- When a server goes down, ConVirt will migrate all virtual machines to a standby server. When no standby server is designated, other servers within the server pool are used to host the virtual machines.
- For Server Failover, we strongly encourage you to configure fencing, in order to eliminate the risk of the same virtual machine running on two physical servers. When you configure fencing, ConVirt will ensure that the unreachable server is brought down or disconnected from its storage, prior to starting virtual machines on another server. You will be able to configure fencing on the next page.
- If you do not have fencing devices in your environment, we recommend selecting the Virtual Machine Failover option. You may still choose to continue with Server Failover with the associated risk.
- Configure Migration Path : In event of physical sever failure, ConVirt will migrate all virtual machines on that server to another physical server based on selected option:
- Migrate to one or more physical servers in the server pool
- Migrate to one of the dedicated standby servers
The list shows all available physical servers with their CPU, memory and platform details. You can mark the servers to be standby servers using check boxes in âstandbyâ column.
Specify Virtual Machine Priority ConVirt allows specifying a priority level for each virtual machine so that in the event of resource crunch (multiple failures), virtual machines running critical functions can be made available first. The list on this page includes all virtual machines in this server pool and the priority for each can be set using the drop down box in âpriorityâ column. (Click on the cell to activate the drop down)
There are 4 priority levels recognized by ConVirt: Low, Medium, High and Critical.
Specify Fencing devices ConVirt requires each physical server to have at least one fencing device configured. In event of server failure, ConVirt will make sure that the server is indeed isolated/powered down so that virtual machines can be started on a different server. In absence of this there is a chance that the same virtual machine get started on two servers, causing corruption.
Specifying fencing configuration is a two step process.
- Declare the fencing device. (usually connection, credential information etc.)
- This is done by defining the fencing device at the data center level. You will need FULL permission on the Data Center to perform this operation. Once done, it is available for use in HA configuration at the server pool.
- Specify server specific fencing device parameters. (usually port etc.)
- This can be done in via the fencing page. Click âEditâ button in last column to open âFencing Devices and Parametersâ dialog. This dialog lists all fencing devices available and allows you to specify a fencing device along with its server specific parameters.
Fence Device Configuration
Server specific Fence Device Configuration
Advanced Options This page allows customizing values for following two attributes:
- Wait Interval: Interval between re-trying to start a VM during failover operation.
- Retry Count: Number of attempts to start a VM before starting it on another server.
1.1 How it works
ConVirt Management Server (CMS) continuously monitors Servers and Virtual Machines. The servers are monitored using ping, while the virtual machines are monitored using APIs specific to virtual machine platform.
Virtual Machine failover When a virtual machine is detected to be down, ConVirt tries to restart it. It tries this few times and then attempts to start it on a different server within a server pool.
Server failover When ConVirt detects server to be down
- It waits a and confirms that server is indeed unreachable.
- It then tries to do a ping from one of other server within the server pool.
- If the server is found down even from one of the peers
- if fencing devices are configured
- ConVirt uses the fencing configuration and runs the fencing script from the peer server.
- On success of fencing script, the server is confirmed to be down
- else assume server is down.
- if fencing devices are configured
- When server is confirmed to be down, ConVirt starts doing the failover
- If specific servers are identified as Stand By servers, ConVirt picks one of them and starts migrating virtual machines to the standby server.
- If no Standby server is specified, ConVirt tries to find suitable server for each virtual machine on the failed server.
- When a server is detected to be up, depending on the configuration the virtual machines would be brought back to the server.