Outstanding

Controller IPv6 traffic stops

  • Description: All IPv6 traffic to/from the controller itself ceases. This does not impact user traffic.
  • Detection:
    • The MD is reachable over IPv4, but not IPv6.
    • The MD is unable to ping its IPv6 gateway.
    • If the MD has established sessions (e.g., tunnels) to IPv6 addresses, those may continue to work.
    • IPv6 neighbor table is stuck. It neither adds nor removes items dynamically.
    • AKiPS availability
    • AKiPS status
  • Workaround:
    • IPv6 dependencies have been (mostly?) removed. User impact should be minimal to none at this point. See Enhancements/IPv6 for details.
    • Bounce the link to the impacted controller. This can be done from either the controller or the router side. Additionally, it seems we can take down a single link in the port channel and bring it back up. Usually the link needs to stay down for a few seconds.
    • While the above does (temporarily) restore IPv6, it seems the failover mechanisms are broken, meaning the workaround is user impacting. Current practice is to leave IPv6 broken.
    • Add a static neighbor entry:
      (isb-mm-1) [00:1a:1e:03:03:08] #show configuration committed | include neigh
      ipv6 neighbor 2607:b400:64:4000::1 vlan 299 00:31:46:17:df:f0
      
      This allows traffic to flow through the gateway, but does not allow traffic from the gateway itself.
      (col-md-2) *#show ipv6 route
      
      Thu Jan 18 12:52:18.483 2024
      
      Codes: C - connected, O - OSPF, R - RIP, S - static
             M - mgmt, U - route usable, * - candidate default
      
      Gateway of last resort is 2607:b400:64:4000::1 to network ::/128 at cost 1
      S*    ::/0 [0/1] via 2607:b400:64:4000::1*
      C    2607:b400:64:4000::/64 is directly connected, VLAN299
      C    2607:b400:a00:1::/64 is directly connected, VLAN801
      (col-md-2) *#ping ipv6 2001:468:c80:210f:0:165:9b7d:7dcb
      
      Press 'q' to abort.
      Sending 5, 92-byte ICMPv6 Echos to 2001:468:c80:210f:0:165:9b7d:7dcb, timeout is 2 seconds:
      !!!!!
      Success rate is 100 percent (5/5), round-trip min/avg/max = 0.468/0.5676/0.656 ms
      
      (col-md-2) *#ping ipv6 2607:b400:64:4000::1
      
      Press 'q' to abort.
      Sending 5, 92-byte ICMPv6 Echos to 2607:b400:64:4000::1, timeout is 2 seconds:
      .....
      Success rate is 0 percent (0/5)
      
  • Impact:
    • All communication between the controllers and CPPM is over v6. Thus, no clients can authenticate on the VirginiaTech SSID.
    • Other system services on the controller happen over v6 including NTP and DNS.
    • Whatever the impact of an out-of-date neighbor table is.
  • Unknowns / next steps:
    • What is the impact of an incorrect neighbor table on an MD? E.g., what is the impact on a client that is not in the table? Does this impact Air Group? Efficiency/speed the MD can switch packets? Does this prevent the MD from short-circuiting or optimizing client ND?
    • Is the MD participating in neighbor discovery at all? Is it sending/receiving NS? Is it sending NA?
  • TAC cases:

Config out of sync

  • Description: The config on the MC device node and on the corresponding MD are different.

  • Symptoms From the MM:

    (isb-mm-1) *[00:1a:1e:02:d8:90] #show configuration effective | include
    debugging
    logging user-debug 9c:b6:d0:da:1e:8f level debugging
    logging arm-user-debug 9c:b6:d0:da:1e:8f level debugging
    (isb-mm-1) *[00:1a:1e:02:d8:90] #
    

    From the MD:

    (col-md-1) *#show running-config | include debugging
    Building Configuration...
    logging security process dot1x-proc level debugging
    logging level debugging arm-user-debug 9c:b6:d0:da:1e:8f
    logging level debugging user-debug 9c:b6:d0:da:1e:8f
    
  • TAC case: 5360416723

  • Notes

    • ccm-debug full-config-sync did not resolve the issue
    • Problem went away on it's own, probably from subsequent commits.
    • Currently writing a script that compares the config from API