Symptoms:
aaa user delete
commands from the MC do not ever get a response from the
v6 controllers.
Running a second command requires waiting for the timeout (default 300 s)
Recreate the problem:
Have at least one MD connect to the MC over IPv6, and note which MD these
are.
To do this, configure them with the masteripv6
or conductoripv6
command
instead of the masterip
or conductorip
command.
(isb-mm-1) [mynode] #cd vtc-md-1
(isb-mm-1) [00:1a:1e:04:b1:10] #show configuration committed | include conductor
conductoripv6 2607:b400:2:2000:0:173:32:36 ipsec-factory-cert conductor-mac-1 20:4c:03:8f:53:1a conductor-mac-2 20:4c:03:0e:e0:44 interface-f vlan-f 100
This can be verified with the show switches debug
command, noting which
version is used in the "IP Address" column.
(isb-mm-1) [mynode] #show switches debug
All Switches
------------
IP Address MAC Name Nodepath Type Model Version Status Uptime CrashInfo Config Sync Time (sec) License Release Type
---------- --- ---- -------- ---- ----- ------- ------ ------ --------- ---------------------- ------- ------------
128.173.32.34 20:4c:03:8f:53:1a isb-mm-1 /mm/mynode conductor ArubaMM-HW-10K 8.10.0.9_88493 up 51d 20h 50m no 0 N/A LSR
128.173.32.35 20:4c:03:0e:e0:44 isb-mm-2 /mm standby ArubaMM-HW-10K 8.10.0.9_88493 up 51d 20h 40m no 0 N/A LSR
172.16.1.11 00:1a:1e:02:d8:90 col-md-1 /md/vt/swva/col MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 13m no 0 N/A LSR
172.16.1.12 00:1a:1e:03:03:08 col-md-2 /md/vt/swva/col MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 13m yes 0 N/A LSR
172.16.1.13 00:1a:1e:02:d8:f0 col-md-3 /md/vt/swva/col MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 13m no 0 N/A LSR
172.16.1.14 00:1a:1e:03:02:78 col-md-4 /md/vt/swva/col MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 13m yes 0 N/A LSR
172.16.1.141 00:1a:1e:03:01:98 bur-md-1 /md/vt/swva/bur MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 19m no 0 N/A LSR
172.16.1.142 00:1a:1e:02:d8:b0 bur-md-2 /md/vt/swva/bur MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 19m yes 0 N/A LSR
172.16.1.143 00:1a:1e:02:d9:70 bur-md-3 /md/vt/swva/bur MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 18m no 0 N/A LSR
172.16.1.144 00:1a:1e:03:00:a8 bur-md-4 /md/vt/swva/bur MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 19m no 0 N/A LSR
172.17.1.11 00:1a:1e:03:00:d8 res-md-1 /md/vt/swva/res MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 9m yes 0 N/A LSR
172.17.1.12 00:1a:1e:03:01:90 res-md-2 /md/vt/swva/res MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 8m yes 0 N/A LSR
172.17.1.13 00:1a:1e:03:11:10 res-md-3 /md/vt/swva/res MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 9m yes 0 N/A LSR
172.17.1.14 00:1a:1e:03:0f:f8 res-md-4 /md/vt/swva/res MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 9m yes 0 N/A LSR
2607:b400:62:1400:0:16:247:11 00:1a:1e:04:b1:10 vtc-md-1 /md/vt/swva/vtc MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 3m no 0 N/A LSR
2607:b400:62:1400:0:16:247:12 00:1a:1e:04:b1:18 vtc-md-2 /md/vt/swva/vtc MD Aruba7240XM 8.10.0.9_88493 up 49d 7h 3m no 0 N/A LSR
172.16.236.151 00:1a:1e:00:14:30 nvc-md-1 /md/vt/nova/nvc MD Aruba7220 8.10.0.9_88493 up 49d 7h 5m no 0 N/A LSR
172.16.236.152 00:1a:1e:00:99:70 nvc-md-2 /md/vt/nova/nvc MD Aruba7220 8.10.0.9_88493 up 49d 7h 7m no 0 N/A LSR
Total Switches:18
From the MC, run a aaa user delete ...
command, then check the status:
(isb-mm-1) [mynode] #aaa user delete mac 00:11:22:33:44:55
Users will be deleted at MDs. Please check show CLI for the status
(isb-mm-1) [mynode] #aaa user delete mac 11:22:33:44:55:66
The previous CLI is still in progess, please try later!
(isb-mm-1) [mynode] #show aaa user-delete-result
Summary of user delete CLI requests !
Current user delete request timeout value: 300 seconds
aaa user delete mac 00:11:22:33:44:55 , Overall Status- Response pending , Total users deleted- 0
MD IP : 172.16.1.11, Status- Complete , Count- 0
MD IP : 172.16.1.12, Status- Complete , Count- 0
MD IP : 172.16.1.13, Status- Complete , Count- 0
MD IP : 172.16.1.14, Status- Complete , Count- 0
MD IP : 172.16.1.141, Status- Complete , Count- 0
MD IP : 172.16.1.142, Status- Complete , Count- 0
MD IP : 172.16.1.143, Status- Complete , Count- 0
MD IP : 172.16.1.144, Status- Complete , Count- 0
MD IP : 172.17.1.11, Status- Complete , Count- 0
MD IP : 172.17.1.12, Status- Complete , Count- 0
MD IP : 172.17.1.13, Status- Complete , Count- 0
MD IP : 172.17.1.14, Status- Complete , Count- 0
MD IP : 0.0.0.0, Status- Response pending , Count- 0
MD IP : 0.0.0.0, Status- Response pending , Count- 0
MD IP : 172.16.236.151, Status- Complete , Count- 0
MD IP : 172.16.236.152, Status- Complete , Count- 0
Note that the two MDs with IP 0.0.0.0
have a response pending.
These are the two VTC MDs which are connecting the MC over IPv6.
After 300 seconds from when the delete command was run:
(isb-mm-1) [mynode] #show aaa user-delete-result
Summary of user delete CLI requests !
Current user delete request timeout value: 300 seconds
aaa user delete mac 00:11:22:33:44:55 , Overall Status- Complete , Total users deleted- 0
MD IP : 172.16.1.11, Status- Complete , Count- 0
MD IP : 172.16.1.12, Status- Complete , Count- 0
MD IP : 172.16.1.13, Status- Complete , Count- 0
MD IP : 172.16.1.14, Status- Complete , Count- 0
MD IP : 172.16.1.141, Status- Complete , Count- 0
MD IP : 172.16.1.142, Status- Complete , Count- 0
MD IP : 172.16.1.143, Status- Complete , Count- 0
MD IP : 172.16.1.144, Status- Complete , Count- 0
MD IP : 172.17.1.11, Status- Complete , Count- 0
MD IP : 172.17.1.12, Status- Complete , Count- 0
MD IP : 172.17.1.13, Status- Complete , Count- 0
MD IP : 172.17.1.14, Status- Complete , Count- 0
MD IP : 0.0.0.0, Status- Timed out , Count- 0
MD IP : 0.0.0.0, Status- Timed out , Count- 0
MD IP : 172.16.236.151, Status- Complete , Count- 0
MD IP : 172.16.236.152, Status- Complete , Count- 0
Note the command timed out.
Workaround:
Run the command from the appropriate MD.
TAC case:
Description:
API calls sometimes take a really long time.
Symptoms:
API calls time out.
API login process can return a 401.
TCP ACK to the API call is sent immediately, but the API response is still
delayed.
Root cause:
The arci-cli-helper
process is single threaded. Yes, really.
This process appears to be the shim between the HTTP interface of the API
and the system.
This is less a "bug" and more of a "critical design failure".
Recreate the problem:
Make an API call for a command that takes a long time (e.g., show bss-table
)
While that is still waiting on a response, make an API call for a command
that should be nearly instant (e.g., show version
).
Note that the second call will not get a response until the first one
finishes.
TAC case:
Description: An rfc-3576 message's sender is not recognized as a
configured server.
RADIUS RFC 3576 Statistics
--------------------------
Server Disconnect Req Disconnect Acc Disconnect Rej No Secret No Sess ID Bad Auth Invalid Req Pkts Dropped Unknown service CoA Req CoA Acc CoA Rej No perm
------ -------------- -------------- -------------- --------- ---------- -------- ----------- ------------ --------------- ------- ------- ------- -------
172.28.48.84 0 0 0 0 0 0 0 0 0 0 0 0 0
172.28.49.84 0 0 0 0 0 0 0 0 0 0 0 0 0
2607:b400:62:9200:0:8f:ee32:b3f3 0 0 0 0 0 0 0 0 0 0 0 0 0
2607:b400:62:9200:0:95:1b5d:6dfa 0 0 0 0 0 0 0 0 0 0 0 0 0
2607:b400:92:8400:0000:0044:7dcf:5796 0 0 0 0 0 0 0 0 0 0 0 0 0
2607:b400:92:8400:0000:0046:275b:4605 0 0 0 0 0 0 0 0 0 0 0 0 0
2607:b400:92:8500:0000:0041:89db:6313 0 0 0 0 0 0 0 0 0 0 0 0 0
2607:b400:92:8500:0000:004d:be0b:1156 0 0 0 0 0 0 0 0 0 0 0 0 0
Packets received from unknown clients : 1653
Packets received with unknown request : 0
Total RFC3576 packets Received : 1653
Workaround:
IPv6 addresses must be formatted omitting leading zeros, but also without
the use of a double colon (::
).
Different formats of the same address are recognized as different profiles.
Incorrect: 2607:b400:0092:8400:0000:0044:7dcf:5796
Incorrect: 2607:b400:92:8400::44:7dcf:5796
Correct: 2607:b400:92:8400:0:44:7dcf:5796
Description: Tunnel between MM and MD is broken.
Symptoms:
So far, this has only happened to col-md-r2:
controller MAC: 00:0b:86:b4:d3:a7
system serial: CR0001355
The problem has persisted after multiple factory resets.
Cluster VRRP address is down for the impacted MD.
Temporary workaround:
To restore the tunnel, on the MM run:
process restart isakmpd
Long-term workaround:
We moved the RAPs to lcc-col and decommissioned col-md-r2.
Motivation was consolidation of controllers, not "fixing" this bug.
TAC cases:
JIRA tasks: