Skip to main content

Starting in release 13.2 VOS has offered the ability to configure a fault tolerant IP interface by configuring active/standby Ethernet adapters. The active adapter is given one Ethernet MAC address and the standby is given another. If the active adapter fails the standby is reset to use the active MAC address and becomes the active adapter. When the previous active adapter comes back into service it takes on the role of the standby adapter, including its MAC address. Remote hosts communicating with the module through the IP interface see, at worst, a delay of a second or so during the transition.

The problem is that it is not enough to configure VOS, the network switches that the standby and active adapters plug into must also be configured properly. This is usually not a problem when the active/standby partnership is first configured but over time it is possible for the switch configuration to mutate in ways that do not harm current communications but prevent a seamless fail over when VOS changes the state of the standby adapter to active.

There are 3 ways to discover that these switch configuration mutations have occurred.

The first is when the active adapter fails and VOS switches the state of the standby adapter to active. At this point remote hosts can no longer communication with the module via that interface and you have a failed adapter. I have seen this situation many times – it is not pretty.

The second way is to review the switch configuration. This involves getting the network administrator to review a working configuration looking for something that might cause a problem – good luck.

The third is to test. You can use the dlmux_admin command to force a fail over or on some hardware you can physically pull the adapter from the module or use a VOS command to disable the adapter but I prefer just disconnecting the active adapter from the network. If you have an active connection to the module already established you should be able to tell within seconds if the fail over was seamless or not. By doing this at a time of your choosing you can choose a time when it will be least disruptive if the fail over is not seamless. Also since the adapter is still working the recovery is simple, plug the cable back in and unplug the newly active adapter, wait for the another fail over and then plug the now standby adapter back in.

If the test shows that the fail over is not seamless you have the luxury of working with the network administrator on a demonstrated problem without anyone demanding to know how long until it is fixed. If the test was seamless you’re all set. As the title of this blog suggests I recommend testing periodically, say the first Monday of the month.

© 2024 Stratus Technologies.