The SDLMUX software combines two network adapters into one IP interface, providing a fail over capability if the link on the active adapter goes down or the adapter fails. Once set up it requires virtually no administration but there are some things you should be aware of.
First, SDLMUX does not provide any load balancing; all traffic is transmitted out of and received on the active adapter.
Second, on the ftServer VSeries hardware the active adapter has a MAC address of the form 00:00:A8:4v:wx:yz. The 00:00:A8 is the Stratus organizationally unique identifier (OUI), assigned by the IEEE. The “v” is an index based on the order that the SDLMUX devices where initialized, starting at 0. The value “wx:yz” is based on the system serial number. The standby address differs from the active address only in the upper nibble of the fourth byte, instead of a 4 it is a 6. Also the difference between the MAC addresses of two active (or standby) adapters on the same module will only be in the lower nibble of the fourth byte. The easiest way to get a list of all the MAC addresses is with the analyze_system request “dump_sdlmux” matching on “MAC”, see figure 1.
as: match 'MAC' ; dump_sdlmux MAC address = 0000A8405A8B MAC address = 0000A8605A8B MAC address = 0000A8415A8B MAC address = 0000A8615A8B MAC address = 0000A8425A8B MAC address = 0000A8625A8B MAC address = 0000A8435A8B MAC address = 0000A8635A8B as: |
Figure 1
Just the MAC addresses aren’t very helpful; by matching on “MAC” or “#” you get the MAC addresses and the SDLMUX and network adapter device names. Note that this list will not contain any network adapters that are not part of an SDLMUX partnership.
as: match 'MAC' -or '#' ; dump_sdlmux sdlmux device = #sdlmuxA.m16.10-5-0.11-5-0 MAC address = 0000A8405A8B Interface device = %phx_vos#enetA.m16.10-5-0 MAC address = 0000A8605A8B
sdlmux device = #sdlmuxA.m16.10-5-1.11-5-1 MAC address = 0000A8415A8B Interface device = %phx_vos#enetA.m16.10-5-1 MAC address = 0000A8615A8B
sdlmux device = #sdlmux.m16.11-2 MAC address = 0000A8425A8B Interface device = %phx_vos#enet.m16.11.11-2 MAC address = 0000A8625A8B
sdlmux device = #sdlmux.m16.11-3 MAC address = 0000A8435A8B Interface device = %phx_vos#enet.m16.10.11-3 MAC address = 0000A8635A8B Interface device = %phx_vos#enet.m16.11.11-3 as: |
Figure 2
Third, SDLMUX network adapters send Ethernet 802.2 LLC frames to their partners to insure that the network path is functioning correctly. Five sets of these test frames go out at three second intervals, followed by a 33 second interval. Trace 1 shows three cycles of this pattern. Trace 1 also shows the actual frame from the active adapter (frame 1) and the standby adapter (frame 2). I’ve highlighted the frame following the 33 second gap just to make the cycles easier to read. These are not Ethernet type II frames or IP packets and the switches connected to the network adapters and any switches along the path between the two adapters must have a configuration that does not block these 802.2 LLC frames.
No. delta Time Source Destination Protocol Info 1 0.000000 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 0000 00 00 a8 62 5a 8b 00 00 a8 42 5a 8b 00 1b ac ac ...bZ....BZ..... 0010 03 31 32 39 2e 31 2e 30 00 00 00 00 00 00 00 00 .129.1.0........ 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0030 00 00 00 00 00 00 00 00 00 00 00 00 b1 60 74 48 .............`tH 2 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 0000 00 00 a8 42 5a 8b 00 00 a8 62 5a 8b 00 1b ac ac ...BZ....bZ..... 0010 03 31 32 39 2e 31 2e 30 00 00 00 00 00 00 00 00 .129.1.0........ 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0030 00 00 00 00 00 00 00 00 00 00 00 00 45 23 24 c0 ............E#$. 3 2.999945 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 4 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 5 2.999943 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 6 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 7 2.999882 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 8 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 9 2.999946 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command
11 32.99900 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 12 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 13 2.999945 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 14 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 15 2.999938 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 16 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 17 2.999942 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 18 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 19 2.999943 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 20 0.000008 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 21 32.99900 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 22 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 23 2.999944 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 24 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 25 2.999947 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 26 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 27 2.999938 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 28 0.000007 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 29 2.999946 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 30 0.000010 StratusC_62:5a:8b StratusC_42:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command 31 32.99900 StratusC_42:5a:8b StratusC_62:5a:8b LLC U, func=UI; DSAP 0xac Individual, SSAP 0xac Command |
Trace 1
Note, that you cannot use packet_monitor to see these test frames. The frames are sent below the point where packet_monitor taps into the stack and SDLMUX removes the frames from the stack before packet_monitor can read them.
Fourth, if each adapter is connected to a different switch and the link between those switches fails or for some other reason the test frames do not get though you will see something that looks like figure 3 in the syserr_log. Starting in release 16.2.1ak and 17.0.0ah if there is a test frame failure SDLMUX triggers an ARP request to the last host that was successfully ARP’ed over the suspect interface. If it gets an answer it knows that the active adapter is working so it resets the standby adapter to try to get it to work. If there is a network issue blocking the test frames resetting the adapter doesn’t help and the adapter eventually goes MTBF, see time stamp 08:08:10 in figure 3. The dlmux_admin command will report the broken adapter as DOWN, figure 4, and a trace will show no test frames since at this point there is nothing to test. If an ARP reply is not received or the problem is on a release that does not send the ARP request SDLMUX will fail over the adapters and break the new standby adapter. If during the next cycle the test frames again fail and no ARP reply is received the fail over and break will be done again. Eventually one adapter will go MTBF.
08:05:02 WARNING(64): SDLMUX: the devices in group #sdlmux.m16.11-2 08:05:02 WARNING(65): SDLMUX: are not exchanging test packets with each other 08:05:02 WARNING(66): SDLMUX: but the active adapter is able to communicate wi +th other hosts! 08:05:02 WARNING(67): SDLMUX: This indicates some sort of network or cabling i +ssue! 08:05:02 WARNING(68): SDLMUX: breaking adapter %phx_vos#enet.m16.10.11-2: XID +communication issue 08:05:02 PCI 10/11/2 enet.m16.10.11-2 Break Requested 08:05:02 WARNING(69): SDLMUX: device name %phx_vos#enet.m16.10.11-2 is broken 08:05:02 PCI 10/11/2 enet.m16.10.11-2 Adding 08:05:04 PCI 10/11/2 enet.m16.10.11-2 Online 08:05:04 WARNING(70): genet in (10/11/2) Link is Up. 08:05:04 WARNING(71): SDLMUX: device %phx_vos#enet.m16.10.11-2 back to servic +e 08:05:49 WARNING(72): SDLMUX: breaking adapter %phx_vos#enet.m16.10.11-2: XID +communication issue 08:05:49 PCI 10/11/2 enet.m16.10.11-2 Break Requested 08:05:49 WARNING(73): SDLMUX: device name %phx_vos#enet.m16.10.11-2 is broken 08:05:49 PCI 10/11/2 enet.m16.10.11-2 Adding 08:05:52 PCI 10/11/2 enet.m16.10.11-2 Online 08:05:52 WARNING(74): genet in (10/11/2) Link is Up. 08:05:52 WARNING(75): SDLMUX: device %phx_vos#enet.m16.10.11-2 back to servic +e 08:07:22 WARNING(76): SDLMUX: the devices in group #sdlmux.m16.11-2 08:07:22 WARNING(77): SDLMUX: are not exchanging test packets with each other 08:07:22 WARNING(78): SDLMUX: but the active adapter is able to communicate wi +th other hosts! 08:07:22 WARNING(79): SDLMUX: This indicates some sort of network or cabling i +ssue! 08:07:22 WARNING(80): SDLMUX: breaking adapter %phx_vos#enet.m16.10.11-2: XID +communication issue 08:07:22 PCI 10/11/2 enet.m16.10.11-2 Break Requested 08:07:22 WARNING(81): SDLMUX: device name %phx_vos#enet.m16.10.11-2 is broken 08:07:22 PCI 10/11/2 enet.m16.10.11-2 Adding 08:07:25 PCI 10/11/2 enet.m16.10.11-2 Online 08:07:25 WARNING(82): genet in (10/11/2) Link is Up. 08:07:25 WARNING(83): SDLMUX: device %phx_vos#enet.m16.10.11-2 back to servic +e 08:08:10 WARNING(84): SDLMUX: breaking adapter %phx_vos#enet.m16.10.11-2: XID +communication issue 08:08:10 PCI 10/11/2 enet.m16.10.11-2 Break Requested 08:08:10 PCI 10/11/2 MTBF Failure 08:08:10 WARNING(85): SDLMUX: device name %phx_vos#enet.m16.10.11-2 is broken |
Figure 3
dlmux_admin #sdlmux.m16.11-2 sdlmux_status Group Name: #sdlmux.m16.11-2 Device Name: %phx_vos#enet.m16.11.11-2 Adapter State: ACTIVE UP Partner: %phx_vos#enet.m16.10.11-2 Partner State: DOWN |
Figure 4
To bring the adapter back into service you need to add it back with the board_admin command. Of course if you do that without correcting the underlying problem the adapter will just MTBF again.
board_admin 10/11/2 add board_admin device_id: 10/11/2 action: add Do you want to continue? (yes, no) yes Command completed. |
Figure 5
Fifth, since the active adapter always has the same MAC address the effect of a fail over is to move the MAC address from one switch port to another. Any security settings on the switch ports must allow this change. Also switch ports can be configured to talk to other switch ports, negotiating various settings. These can be triggered when the switch notices a change in topology, like a new MAC address or a link being restored. Until these settings are negotiated a switch may not pass regular frames. The switch ports connected to the SDLMUX adapters should be configured not to perform these negotiations. In addition the switch ports connected to the SDLMUX adapters should be configured not to run the spanning tree protocol or to skip the learning and listening steps (this is called portfast by Cisco). During the learning and listening steps the switch will not pass regular data frames. In extreme cases the delay caused by these settings can be so long that SDLMUX triggers another fail over.
Finally, the link status of the adapters needs to be monitored to be sure that both links are up. The failure of a link will not cause the system to call home and because its fault tolerant there is no loss of system connectivity I covered this in a previous blog post, see Monitoring network adapter status.