Process of discovering a controller
AP state machine (sequence of states that the AP undergoes following bootup)
- AP boots on a small IOS, gets IP via DHCP and communicates through the network.
- AP tries to discovertheWLC.
- Discovery methods in order of sequence followed
-
- Local subnet broadcast.
- AP sends a unicast CAPWAP Discovery Request or broadcast in local subnet.
- Controller returns a CAPWAP Discovery Response
- Prior knowledge of the WLC (a primary, a secondary, and a tertiary). These are primed addresses. They are stored in NVRAM so that the AP can remember after reboot. If AP was previously connected with a controller, it should have up to 8 out of a list of 32 addresses that it received from the controller. It will try to communicate with as many as possible to build a list of candidates.
- DHCP server can send DHCP option 43 that suggests a list of WLCs.
- DNS. AP tries to resolve the name CISCO- CAPWAP-CONTROLLER.localdomain via DNS
- Reset and try again
- Local subnet broadcast.
- How an AP selects a WLC.
-
- Try primed addresses
- It tries the master controller
- Try the least loaded controller. During the discovery process, the WLC also sends its load (ratio of the number of APs joined to the total capacity). If a WLC is oversubscribed, it cannot add any more APs. APs can be considered with a priority value beginning with a default of low. It can be low, medium, high, critical. If WLC is loaded, it will reject the APs with low priority to make room for higher priority ones.
- AP builds a CAPWAP tunnel to WLC. They authenticate each other by exchanging their digital certificates. Tunnel is a secure Data Transport Layer Security channel for AP-WLC control messages.
- AP sends CAPWAP join request, WLC sends CAPWAP join response
- WLC tells the AP the image that its supposed to use. If the AP image differs, the AP will download the image from the WLC. otherwise, no need. Downloading can take some time. If the AP gets rehommed, it will get the image version that is on the new WLC..therefore it is best to have them all running the same version. The AP image version is dependent on the WLC that it joins and cannot be specified.
- AP downloads its config from the WLC and updates existing values.
- WLC places the AP in run state. They provide the BSS and begin accepting clients.
- Reset – If reset, it tears down the tunnel and existing client associations, reboots and starts from 1.
Designing High Availability
If a WLC fails, the AP tries to join the least loadded WLC in its list. Best way is to use the primed option – primary, secondary, tertiary). The AP builds a CAPWAP tunnel to more than 1 WLC but will only join 1. Incase of failure, time is not wasted.
How an AP detects a controller failure:
- Default keepalives sent every 30 sec(heartbeat), If missed, more are sent at certain intervals (depending on version, for v7 – 5 at 1 sec, for 7.2 – 4 at 3 sec), if no answer, AP moves to next. Keepalive can be adjusted between 1 to 30 sec AP will remain joined to WLC until it fails. AP fallback feature enables you to fall back to the WLC after it has come back online.
Redundant WLCs should be configured similarly.
N+1 Redundancy (N:1)
- N controllers are backed up by 1 WLC.
- Can withstand failure of only 1 WLC.
- Configure primary and secondary WLC only
- backup controller must sit idle and empty of APs until another controller fails.
- Backup must have same capacity as the active WLC it supports.
- the backup controller must be configured identically to every other active controller it has to support.
N+N Redundancy (N:N or 1+1)
- Controllers are grouped in pairs.
- you can divide the active role across two separate devices.
- APs and clients loads will be distributed across separate hardware
- N+N redundancy can support failures of more than one controller, but
only if the active controllers are configured in pairs. - APs are configured with primary and secondary WLC
N+N+1 Redundancy
- Has advantages of both N+N and N+1 redundancy
- APs are configured with primary, secondary and tertiary WLC
- if the other active controller happens to fail, the backup controller is available to carry the load.
- The tertiary should be left with 0% AP load so that it can carry the load for the rest.
AP SSO (AP stateful switchover (SSO) Redundancy
- Keeps failover transparent from APs
- Groups controllers into HA pairs – Active and hot standby. Active has the licences necessary for AP count, the hotstandby has HA licence. Standby can be paired with Active of any size.
- APs are configured with only a primary. The rest do not need to be configured unless for additional redundancy.
- APs create a CAPWAP tunnel to the active unit
- The active unit keeps CAPWAP tunnels, AP states, configurations, and image files all in sync with the hot standby unit.
- Incase of failure, APs do not have to rebuild the CAPWAP. the controllers simply swap roles so the APs can stay joined to the active controller in the HA pair
- The active and standby controllers must always run an identical software image.
- The two controllers share a “mobility” MAC address that initially comes
from the first active unit’s MAC address. From then on, that address is maintained by whichever unit has the active role at any given time. - The controllers also share a virtual IP address.
- When one controller is upgraded, its standby peer is also upgraded and same goes to the rebooting
- The hot standby controller monitors the active unit through keepalives that are sent every 100 ms, if unanswered, standby begins sending ICMP echo requests to determine what is wrong, if active has failed, standby takes over. The failover may take up to 500 ms, in the case of a crash or power failure, or up to 4 seconds if a network failure has occurred.
- AP SSO does not maintain the state of any clients. If a primary
controller fails, any associated clients will be dropped and will have to reassociate with their APs (and the secondary controller). Fron v7.5 A primary controller synchronizes the state of each associated client that
is in the RUN state with a secondary controller. If the primary fails, the secondary will already have the current state information for each client, making the failover process transparent