Workaround

Delegated commands to v6 controllers fail

  • Symptoms:
    • aaa user delete commands from the MC do not ever get a response from the v6 controllers.
    • Running a second command requires waiting for the timeout (default 300 s)
  • Recreate the problem:
    • Have at least one MD connect to the MC over IPv6, and note which MD these are. To do this, configure them with the masteripv6 or conductoripv6 command instead of the masterip or conductorip command.
      (isb-mm-1) [mynode] #cd vtc-md-1
      (isb-mm-1) [00:1a:1e:04:b1:10] #show configuration committed | include conductor
      conductoripv6 2607:b400:2:2000:0:173:32:36 ipsec-factory-cert conductor-mac-1 20:4c:03:8f:53:1a conductor-mac-2 20:4c:03:0e:e0:44 interface-f vlan-f 100
      
      This can be verified with the show switches debug command, noting which version is used in the "IP Address" column.
      (isb-mm-1) [mynode] #show switches debug
      
      All Switches
      ------------
      IP Address                     MAC                Name      Nodepath         Type       Model           Version         Status  Uptime       CrashInfo  Config Sync Time (sec)  License  Release Type
      ----------                     ---                ----      --------         ----       -----           -------         ------  ------       ---------  ----------------------  -------  ------------
      128.173.32.34                  20:4c:03:8f:53:1a  isb-mm-1  /mm/mynode       conductor  ArubaMM-HW-10K  8.10.0.9_88493  up      51d 20h 50m  no         0                       N/A      LSR
      128.173.32.35                  20:4c:03:0e:e0:44  isb-mm-2  /mm              standby    ArubaMM-HW-10K  8.10.0.9_88493  up      51d 20h 40m  no         0                       N/A      LSR
      172.16.1.11                    00:1a:1e:02:d8:90  col-md-1  /md/vt/swva/col  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 13m   no         0                       N/A      LSR
      172.16.1.12                    00:1a:1e:03:03:08  col-md-2  /md/vt/swva/col  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 13m   yes        0                       N/A      LSR
      172.16.1.13                    00:1a:1e:02:d8:f0  col-md-3  /md/vt/swva/col  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 13m   no         0                       N/A      LSR
      172.16.1.14                    00:1a:1e:03:02:78  col-md-4  /md/vt/swva/col  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 13m   yes        0                       N/A      LSR
      172.16.1.141                   00:1a:1e:03:01:98  bur-md-1  /md/vt/swva/bur  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 19m   no         0                       N/A      LSR
      172.16.1.142                   00:1a:1e:02:d8:b0  bur-md-2  /md/vt/swva/bur  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 19m   yes        0                       N/A      LSR
      172.16.1.143                   00:1a:1e:02:d9:70  bur-md-3  /md/vt/swva/bur  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 18m   no         0                       N/A      LSR
      172.16.1.144                   00:1a:1e:03:00:a8  bur-md-4  /md/vt/swva/bur  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 19m   no         0                       N/A      LSR
      172.17.1.11                    00:1a:1e:03:00:d8  res-md-1  /md/vt/swva/res  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 9m    yes        0                       N/A      LSR
      172.17.1.12                    00:1a:1e:03:01:90  res-md-2  /md/vt/swva/res  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 8m    yes        0                       N/A      LSR
      172.17.1.13                    00:1a:1e:03:11:10  res-md-3  /md/vt/swva/res  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 9m    yes        0                       N/A      LSR
      172.17.1.14                    00:1a:1e:03:0f:f8  res-md-4  /md/vt/swva/res  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 9m    yes        0                       N/A      LSR
      2607:b400:62:1400:0:16:247:11  00:1a:1e:04:b1:10  vtc-md-1  /md/vt/swva/vtc  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 3m    no         0                       N/A      LSR
      2607:b400:62:1400:0:16:247:12  00:1a:1e:04:b1:18  vtc-md-2  /md/vt/swva/vtc  MD         Aruba7240XM     8.10.0.9_88493  up      49d 7h 3m    no         0                       N/A      LSR
      172.16.236.151                 00:1a:1e:00:14:30  nvc-md-1  /md/vt/nova/nvc  MD         Aruba7220       8.10.0.9_88493  up      49d 7h 5m    no         0                       N/A      LSR
      172.16.236.152                 00:1a:1e:00:99:70  nvc-md-2  /md/vt/nova/nvc  MD         Aruba7220       8.10.0.9_88493  up      49d 7h 7m    no         0                       N/A      LSR
      
      Total Switches:18
      
    • From the MC, run a aaa user delete ... command, then check the status:
      (isb-mm-1) [mynode] #aaa user delete mac 00:11:22:33:44:55
      Users will be deleted at MDs. Please check show CLI for the status
      (isb-mm-1) [mynode] #aaa user delete mac 11:22:33:44:55:66
      The previous CLI is still in progess, please try later!
      (isb-mm-1) [mynode] #show aaa user-delete-result
      
      Summary of user delete CLI requests !
      Current user delete request timeout value: 300 seconds
      
      aaa user delete mac 00:11:22:33:44:55  , Overall Status- Response pending , Total users deleted- 0
      MD IP : 172.16.1.11, Status- Complete , Count- 0
      MD IP : 172.16.1.12, Status- Complete , Count- 0
      MD IP : 172.16.1.13, Status- Complete , Count- 0
      MD IP : 172.16.1.14, Status- Complete , Count- 0
      MD IP : 172.16.1.141, Status- Complete , Count- 0
      MD IP : 172.16.1.142, Status- Complete , Count- 0
      MD IP : 172.16.1.143, Status- Complete , Count- 0
      MD IP : 172.16.1.144, Status- Complete , Count- 0
      MD IP : 172.17.1.11, Status- Complete , Count- 0
      MD IP : 172.17.1.12, Status- Complete , Count- 0
      MD IP : 172.17.1.13, Status- Complete , Count- 0
      MD IP : 172.17.1.14, Status- Complete , Count- 0
      MD IP : 0.0.0.0, Status- Response pending , Count- 0
      MD IP : 0.0.0.0, Status- Response pending , Count- 0
      MD IP : 172.16.236.151, Status- Complete , Count- 0
      MD IP : 172.16.236.152, Status- Complete , Count- 0
      
      Note that the two MDs with IP 0.0.0.0 have a response pending. These are the two VTC MDs which are connecting the MC over IPv6.
    • After 300 seconds from when the delete command was run:
      (isb-mm-1) [mynode] #show aaa user-delete-result
      
      Summary of user delete CLI requests !
      Current user delete request timeout value: 300 seconds
      
      aaa user delete mac 00:11:22:33:44:55  , Overall Status- Complete , Total users deleted- 0
      MD IP : 172.16.1.11, Status- Complete , Count- 0
      MD IP : 172.16.1.12, Status- Complete , Count- 0
      MD IP : 172.16.1.13, Status- Complete , Count- 0
      MD IP : 172.16.1.14, Status- Complete , Count- 0
      MD IP : 172.16.1.141, Status- Complete , Count- 0
      MD IP : 172.16.1.142, Status- Complete , Count- 0
      MD IP : 172.16.1.143, Status- Complete , Count- 0
      MD IP : 172.16.1.144, Status- Complete , Count- 0
      MD IP : 172.17.1.11, Status- Complete , Count- 0
      MD IP : 172.17.1.12, Status- Complete , Count- 0
      MD IP : 172.17.1.13, Status- Complete , Count- 0
      MD IP : 172.17.1.14, Status- Complete , Count- 0
      MD IP : 0.0.0.0, Status- Timed out , Count- 0
      MD IP : 0.0.0.0, Status- Timed out , Count- 0
      MD IP : 172.16.236.151, Status- Complete , Count- 0
      MD IP : 172.16.236.152, Status- Complete , Count- 0
      
      Note the command timed out.
  • Workaround:
    • Run the command from the appropriate MD.
  • TAC case:

API timeouts

  • Description: API calls sometimes take a really long time.
  • Symptoms:
    • API calls time out.
    • API login process can return a 401.
    • TCP ACK to the API call is sent immediately, but the API response is still delayed.
  • Root cause:
    • The arci-cli-helper process is single threaded. Yes, really.
    • This process appears to be the shim between the HTTP interface of the API and the system.
    • This is less a "bug" and more of a "critical design failure".
  • Recreate the problem:
    • Make an API call for a command that takes a long time (e.g., show bss-table)
    • While that is still waiting on a response, make an API call for a command that should be nearly instant (e.g., show version).
    • Note that the second call will not get a response until the first one finishes.
  • TAC case:

aaa rfc-3576-server profiles are dumb

  • Description: An rfc-3576 message's sender is not recognized as a configured server.
  • Symptoms:


RADIUS RFC 3576 Statistics
--------------------------
Server                                 Disconnect Req  Disconnect Acc Disconnect Rej  No Secret  No Sess ID  Bad Auth  Invalid Req  Pkts Dropped Unknown service  CoA Req  CoA Acc  CoA Rej  No perm
------                                 --------------  -------------- --------------  ---------  ----------  --------  -----------  ------------ ---------------  -------  -------  -------  -------
172.28.48.84                           0               0              0               0          0           0         0            0            0                0        0        0        0
172.28.49.84                           0               0              0               0          0           0         0            0            0                0        0        0        0
2607:b400:62:9200:0:8f:ee32:b3f3       0               0              0               0          0           0         0            0            0                0        0        0        0
2607:b400:62:9200:0:95:1b5d:6dfa       0               0              0               0          0           0         0            0            0                0        0        0        0
2607:b400:92:8400:0000:0044:7dcf:5796  0               0              0               0          0           0         0            0            0                0        0        0        0
2607:b400:92:8400:0000:0046:275b:4605  0               0              0               0          0           0         0            0            0                0        0        0        0
2607:b400:92:8500:0000:0041:89db:6313  0               0              0               0          0           0         0            0            0                0        0        0        0
2607:b400:92:8500:0000:004d:be0b:1156  0               0              0               0          0           0         0            0            0                0        0        0        0

Packets received from unknown clients : 1653
Packets received with unknown request : 0
Total RFC3576 packets Received        : 1653
  • Workaround:
    • IPv6 addresses must be formatted omitting leading zeros, but also without the use of a double colon (::).
    • Different formats of the same address are recognized as different profiles.
    • Incorrect: 2607:b400:0092:8400:0000:0044:7dcf:5796
    • Incorrect: 2607:b400:92:8400::44:7dcf:5796
    • Correct: 2607:b400:92:8400:0:44:7dcf:5796

ERR_IKESA_EXPIRED

  • Description: Tunnel between MM and MD is broken.
  • Symptoms:
    • So far, this has only happened to col-md-r2:
      • controller MAC: 00:0b:86:b4:d3:a7
      • system serial: CR0001355
    • The problem has persisted after multiple factory resets.
    • Cluster VRRP address is down for the impacted MD.
  • Temporary workaround:
    • To restore the tunnel, on the MM run:
      process restart isakmpd
      
  • Long-term workaround:
    • We moved the RAPs to lcc-col and decommissioned col-md-r2.
    • Motivation was consolidation of controllers, not "fixing" this bug.
  • TAC cases:
  • JIRA tasks: