Central on Prem

As with other things, the domain is mobility.nis.vt.edu. For example, the hostname central has the FQDN central.mobility.nis.vt.edu.

HostnameInterfaceIPv4
centralens1f0198.82.169.222/24
central-node-1ens1f0198.82.169.223/24
central-node-2ens1f0198.82.169.224/24
central-node-3ens1f0198.82.169.225/24
central-node-4ens1f0198.82.169.226/24
central-node-5ens1f0198.82.169.227/24

Additional VIP hostnames:

  • central-central
  • apigw-central
  • ccs-user-api-central
  • sso-central

POD IP Range: 10.0.0.0/16 Service IP Range: 10.1.0.0/16

iLO Configuration

Access credentials

  • Local credentials only
  • See password repository for details

Network

iLO Dedicated Network Port > IPv4:

  • Not posting IPs because iLO is hella insecure. They are documented in the NEO password repo.
  • DNS: 172.19.128.3
  • IPv6 is currently not configured.

iLO Dedicated Network Port > SNTP:

  • Disable DHCPv4/6 Supplied Time Settings
  • Disable Propagate NTP Time to Host
  • Primary Time Server: 172.19.131.253
  • Secondary Time Server: conehead or grub
  • Time Zone: Bogota, Lima, Quito, Easter Time(US & Canada) (GMT-05:00:00) NOTE: changing SNTP values will likely require an iLO reset.

Monitoring

SNMP

Management > SNMP Settings:

  • System location: ISB 118
  • System contact: nis-wifi-g@vt.edu
  • System role: Central on Prem
  • System Role Detail: Node 1, Node 2, ...
  • Disable SNMPv1
  • SNMPv3 Users:
    • Security Name: nisnmp
    • See password repo for credentials
    • User Engine ID: blank
  • SNMP Alert Destinations:
    • akips.nis.ipv4.vt.edu
    • Trap Community: blank
    • SNMP Protocol: SNMPv3 Inform
    • SNMPv3 User: nisnmp

Syslog

Management > Remote SNMP:

  • Enable iLO Remote Syslog
  • Remote Syslog Port: 514
  • Remote Syslog Server: akips.nis.ipv4.vt.edu

Disable iLO Federation

iLO Federation > Setup:

  • Delete the default group
  • Disable multicast options:
    • iLO Federation Management
    • Multicast Discovery

IPv6

IPv6 is not supported at all. There is no way to configure an IPv6 address. Not only that, but when configuring the networks settings, we see:

Created symlink /etc/systemd/system/basic.target.wants/disable-ipv6.service → /etc/systemd/system/disable-ipv6.service.

smtp

Allowlist for mailrelay.smtp.vt.edu:

198.82.169.222,central.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
198.82.169.223,central-node-1.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
198.82.169.224,central-node-2.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
198.82.169.225,central-node-3.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
198.82.169.226,central-node-4.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"
198.82.169.227,central-node-5.mobility.nis.vt.edu,"Central on Prem neo-central@vt.edu","NIS"

Parts for redundancy

iLO Administrator and firmware password

The iLO "Administrator" account uses a password derived from the baseband serial number. This is done by the COP installation media. The same password is used for access to the firmware interface.

NOTE: This means that the serial numbers of the nodes are sensitive information! They are stored in the NEO password vault.

The script itself derives the password with the following commands (and some unnecessary file and variable creation...):

dmidecode -t baseboard \
  | grep Serial \
  | grep -o '[^ ]\+$' \
  | md5sum \
  | grep -Eo '^[^ ]+' \
  | cut -c1-8

We can simplify this to:

dmidecode -s baseboard-serial-number | md5sum | head -c 8

Managing the RAID from a live environment

HPE has a variation of secure boot enabled, so we cannot just boot to whatever we want. However, secure boot is just looking for something signed by Canonical... so just grab Ubuntu and be off. Other distros signed with common keys may or may not work, but COP is built on Ubuntu 18.04, so that is the least likely to cause issues.

Unlike the COP ISO, the Ubuntu image can be dd'd to a USB drive to create a bootable media. iLO can also be used to mount virtual media to boot from.

Add HPE repositories

The ssacli utility allows us to reconfigure the RAID setup. The best way to get this is by adding the HPE software delivery repository Management Component Pack.

/etc/apt/sources.list.d/mcp.list:

 # HPE Management Component Pack
deb https://downloads.linux.hpe.com/SDR/repo/mcp bionic/current non-free

Now, install the keys:

curl https://downloads.linux.hpe.com/SDR/hpPublicKey2048.pub | sudo apt-key add -
curl https://downloads.linux.hpe.com/SDR/hpPublicKey2048_key1.pub | sudo apt-key add -
curl https://downloads.linux.hpe.com/SDR/hpePublicKey2048_key1.pub | sudo apt-key add -

Then update the repositories:

sudo apt update

Convert array to RAID 10

This will take a long time. If building a new system, create a new array instead of migrating an existing one.

# ssacli
=> ctrl slot=0 ld 1 add drives=allunassigned
=> ctrl slot=0 ld 1 show status

   logicaldrive 1 (3.49 TB, RAID 0): Transforming, 0.83%

=> ctrl slot=0 ld 1 show status

   logicaldrive 1 (3.49 TB, RAID 0): Transforming, 0.83%

=> ctrl slot=0 ld 1 modify raid=1+0
=> ctrl slot=0 ld 1 show status

   logicaldrive 1 (3.49 TB, RAID 1+0): Transforming, 0.07%

=> ctrl slot=0 ld 1 show status

   logicaldrive 1 (3.49 TB, RAID 1+0): OK

=>

Build a new RAID 10 array

This is a destructive process, but much faster than migrating an array. It is necessary to install COP from an ISO afterwards.

# ssacli
=> ctrl slot=0 ld 1 delete
[confirm]
=> ctrl slot=0 create type=ld drives=allunassigned raid=1+0
=>

Drive replacement (RAID 0)

A failed drive in a RAID 0 array is catastrophic, thus re-installing COP from the ISO afterwards is required.

  • Physically replace the bad drive with a good one
  • Reboot the system
  • Press F9 during the boot to enter System Utilities, a BIOS like environment. You may need to press F1 to continue past the warning message (telling you a drive has failed and been replaced).
  • Select "System Configuration"
  • Select "Embedded RAID 1: HPE Smart Array P408i-a SR Gen 10"
  • Select "Array Configuration"
  • Select "Manage Arrays"
  • Select "Array A"
  • Select "List Logical Drives"
  • Select "Logical Drive 1 (...)"
  • Select "Re-Enable Logical Drive"
  • Confirm that you want to Re-Enable the Logical Drive. We are not expecting the data to be recoverable.
  • Exit the menus until you can exit the system utilities. Re-enabling the array does not count as a change, so there is no need to save.