Post-Mortem

Motivation

The primary goal of the out-of-band (OOB) management network is that the devices are remotely manageable in the case of disaster, when the rest of the network is not functional, and thus service can be restored.

The secondary goal/benefit of the OOB management network is for security. Isolating management to only the OOB network significantly reduces the attack surface of the equipment.

Counter motivation

The first goal is irrelevant, because the Wi-Fi network is an overlay. If the equipment is not reachable it is because the underlay is not working, and we'll be fixing that first. Two notes here:

Administrators of the Wi-Fi network need some kind of network connectivity that isn't the VT Wi-Fi network, which is trivial. A wired adapter, home ISP, mobile hotspot, any of these will do.
To address the case of a device with an unusable network configuration (e.g., the out of box config), they still need some kind of non-network access (i.e., serial), though that access can be reachable through network resources. Indeed, serial connection accessed through the OOB network is already part of our standard setup.

The second goal strongly implies (though doesn't strictly require) that the management of a device is isolated to that device. This is not the case with the Wi-Fi infrastructure. The configuration is all done on the MC, which is pushed to the MDs, which is in turn pushed to the APs.

More critically, there is a need for the management to have a clear separation from the production and support network. An overlay design does not lend itself to this, and sure enough, it does not exist in the wireless controllers. In particular, the controllers do not have multiple routing tables, which makes it extremely difficult if not impossible to separate the different network planes.

In particular:

user traffic is carried to the MD inside a tunnel
MDs in a cluster build a tunnel and have a host specific route to each other
MDs build a tunnel and have a host specific route the MCs

This means any wireless user can reach^† the management of the MC any MD in the cluster they are connected to. This could be stopped with a client ACL, but it must:

be applied to every role
enumerate every address (including IPv6 link local!) on every controller

This is obviously error prone and a fair bit of work, all to accomplish a secondary goal. And we still end with a design that is only a weak assurance of this goal (e.g., have we found every path into the management plane? Probably not.)

^† Can reach the L4 management interface that is. Obviously, L7 still needs auth(z).

Out-Of-Band Management

Logical Diagram

Logical diagram of the wireless management connections

Data paths

MD join clusters with the in band management address

lc-cluster group-profile "lcc-foo"
    controller-v6 <blue> priority 255 mcast-vlan 0 vrrp-ip-v6 <blue> vrrp-vlan <blue> group <#>

APs connect to cluster on in band management
In band mgmt and user networks are trunked over the same port channel.

MD controller IP is in band mgmt

masteripv6 ... interface-f vlan-f <blue>
...
controller-ipv6 vlan <blue> address <blue>

mgmt auth (i.e., netadmin) for MDs happens on OOB mgmt
user auth (e.g., eduroam) happens on in band mgmt
MC-MD management happens inside the IPsec tunnel that gets built over the in band management.

Questions

How do we prevent mgmt login from non-OOB mgmt networks? If we can't do this, we haven't actually done anything.
- Force management to ports 22 and 4343, and only allow these on OOB
  - AP-MD and MD-MC management is done through a tunnel, thus not stopped by these ACLs. This is good for the purposes of getting things to work, but kinda violates the principles we are after to begin with.
  - Captive portals use ports 80 and 443 and we can force HTTPS management to exclusively 4343. This lets us expose a L7 distinction in L4. Again, this functions, but eww.
How many captive portal users are legacy only? Do we need this legacy address?
Can we do no legacy addresses?
- No. At the least, we need legacy addresses for RAPs.
Can we add members to a cluster by an IP that is not the controller IP?
- Yes
Do we want to keep a legacy address on in band mgmt to give us time to migrate APs? (And to have less changes at once)
- Yes. Lets make less changes at once.

TODO

conehead/grub

Add v6 addresses on the OOB mgmt [NISNETR-396]
Accept netadmin auth from the MDs' oob mgmt [NISNETR-399]

MM

Nothing?

MD

Wire up MDs on OOB
Address MDs on OOB
Apply static route to OOB network
Apply ACLs to limit port 4343 and 22 to only be allowed on the OOB side [NISNETR-398]
Change asr-conehead-netadmin to use the OOB v6 address on conehead [NISNETR-399]
Change asr-grub-netadmin to use the OOB v6 address on grub [NISNETR-399]
Figure out initial setup
Remove remaining legacy addresses

Config changes

The MM is configured exactly the same as before. The MDs have additional configuration (col-md-5.dev as an example here):

interface gigabitethernet 0/0/0
    no shutdown
!
vlan 301
    description oob-mgmt
!
interface port-channel 1
    gigabitethernet 0/0/0
    switchport access vlan 301
    switchport mode access
    trusted
    trusted vlan 1-4094
!
interface vlan 301
    operstate up
    ipv6 address 2607:b400:e1:4000:0:0:0:15/64
!
ipv6 route 2607:b400:e1:0:0:0:0:0/48 2607:b400:e1:4000:0:0:132:1

Old ideas

These are things we are currently deciding against. They are noted here in case they turn out to be a good idea or lead to other useful ideas.

MC-MD connection:

Static routes over OOB
IPsec tunnel between MC and FW

Wi-Fi Service