Resolved

Connectivity failures (Aruba Support Advisory ARUBA-SA-20210901-PLVL04)

  • Description: Clients have association failures.

    This case morphed into the Linux client issue. Linux clients would occasionally just stop passing traffic. The device would still be associated, but it could not even ping the UAC. It was mostly observed on Intel AX200 and AX210 cards, but has also been seen on Intel's AC cards and the MediaTek MT7921K. The problem looked like a driver / kernel issue, but its disappearance is more closely correlated to upgrading to ArubaOS 8.10.

  • Symptoms:

    • Clients experience association failures during high bursts of client roaming events.
    • High CPU utilization by the Station Management process (stm) in the MDs.
    • show papi kernel-socket-stats | include 8345,8222,8419,Drops
      • Drops value on port 8419 (STM Low Priority) rapidly increases in 100+ increments within seconds AND sustained large values for CurRxQLen and Drops on port 8435 (STM),
    • show cpuload current
      • stm process stays over 100%
  • TAC cases:

  • Notable versions:

    • 8.7.1.4: observed
    • 8.7.1.5: observed
    • 8.7.1.6: Sanjay claims a fix
    • 8.7.1.6: observed
    • 8.10.0.6: presumed fixed
  • Debug: Logs requested by Rodger: Make sure user debug is enabled:

    logging user-debug <client-mac> level debug
    
    • Currently enabled for waldrep's laptop (46:96:f1:03:32:98)
    no paging
    show cli-timestamp
    show clock
    show ap association client-mac <client-mac>
    show station-table | include <client-mac>
    show auth-tracebuf mac <client-mac>
    show ap client trail-info <client-mac>
    show datapath session table | include <ip address of client>
    show log user-debug 50 | include <client-mac>
    show log security 50 | include <client-mac>
    show log system 50 | include <Affected_AP_Name>
    tar log tech-support
    

    Collect the following when at the time of the issue along with tech support logs:

    clock cli-timestamp
    show dot1x watermark history
    show papi kernelpsocket-stats
    show ap debug client-mgmt-counters
    show ap debug sta-msg-stats
    show ap debug cluster-counters
    show ap debug gsm-counters
    show ap debug client-deauth-reason-counters
    show cpuload current
    show datapath bwm table
    show datapath utilization
    show datapath papi counters
    show datapath debug opcode
    show datapath network ingress
    show datapath maintenance counters
    show datapath debug dma counters
    show datapath message-queue counters
    show auth-tracebuf
    

Kernel panics

  • Description: MD crashes with a kernel panic
  • Symptoms
    • MD reboots
    • Kernel panic
    • TAC asked for kernel core dumps. This option has been enabled for a while, but doesn't seem to be giving what they are asking for.
    • Intent:cause:registers:
      • 12:86:b0:2
      • 12:86:b0:4
      • 12:86:e0:2
      • 12:86:e0:4
      • 12:86:e0:8
      • 78:86:50:2 (logs lost)
  • Bug IDs
    • AOS-216744
  • TAC cases:
  • JIRA tasks:
  • Notable versions:
    • 8.5.0.11:
      • Observed
        • 12:86:e0:2
    • 8.7.1.3:
      • TAC asserts fixed:
        • 12:86:e0:2
    • 8.7.1.4:
      • Observed:
        • 12:86:e0:2
    • 8.7.1.5:
      • TAC asserts fixed:
        • 12:86:e0:2
        • 12:86:e0:4
        • 12:86:b0:4
      • Observed:
        • 12:86:b0:2
        • 12:86:b0:4
        • 12:86:e0:8
    • 8.7.1.5_81619:
      • Observed:
        • 12:86:b0:4
    • 8.7.1.6:
      • TAC asserts fixed
        • 12:86:b0:2

res-md-1 refuses clients

  • Description: any client trying to use res-md-1 as a UAC cannot associate.
  • Symptoms:
    • show lc-cluster load distribution client shows 0 active and 0 standby clients for res-md-1.
    • started with res-md-1 crashing
    • persisted across a reboot and code upgrade
  • TAC cases
  • Notable version:
    • 8.7.1.4: crash that initiated the problem
    • 8.7.1.5: observed

Holy amon logs, Batman!

  • Description: A debug trace on amon_sender_proc and amon_recvr_proc is logged and cannot be disabled. Collectively, the controllers sent over 20,000 logs/s. The problem only showed up on some boots.
  • Bug IDs:
    • AOS-210452
  • TAC cases:
  • Notable versions:
    • 8.7.0.0: bug introduced
    • 8.7.1.4: fixed
  • JIRA task:

No state attribute in RADIUS request

  • Description
    • The RADIUS request packets do not contain the state attribute value and hence, clients face connectivity issue.
  • Bug IDs
    • AOS-207701
    • AOS-218006
  • Notable versions:
    • 8.4.0.0: introduced
    • 8.7.1.3: fixed

Too many pending changes

  • Description
    • If the expected output of show configuration unsaved-nodes was over 1024 characters, then it displayed nothing.
    • This also impacted API output.
  • Bug IDs
    • AOS-210404
  • Notable versions:
    • 8.5.0.10: observed broken
    • 8.5.0.12: fixed
    • 8.7.0.3: fixed