Every once in a while I stumble across an actual technical thing that isn’t an opinion, and in true blogger fashion, I chose to write about it so the next time it happens, I can Google the problem and find my own blog that contains the answer.
I know, I’m surprised that I still have the occasional experience in technical operations as well. What is this world coming to?
For those in the dark, WPA2/3-Enterprise Wi-Fi networks will normally utilize 802.1X for authentication. This allows organizations to authenticate the devices joining their network and set up their encryption in a way that is much harder to compromise (unlike a common Pre-Shared Key that is known to every employee that has ever worked at the organization over the past 5 years). In addition to authentication, there is also authorization and accounting that goes into this and is commonly referred to as AAA for, you guessed it, Authentication, Authorization, and Accounting.
When configuring the WLAN using 802.1X, there are generally two areas where administrators can define which server is acting as the authentication and authorizing server and they can also configure which server is acting as the accounting server. Sometimes these can be the same server, other times they can be different.
This blog isn’t about configuring any of the AAA functions and whatnot, there are week-long classes on that subject. For this blog, I want to cover an anomaly that can happen after configuration and it centers around the accounting piece. On the face of it, accounting seems pretty straightforward. When something happens on the 802.1X WLAN, report that information to the accounting server so it can be logged, reviewed, and accounting can happen for audits in the future.
Where it becomes difficult is when the timing of the accounting packets actually happens too fast. Sure, we want to know when that device joined, but there can be a time when the network is more efficient than makes sense. To better explain, I need to draw a picture and reference it. For this example, the AP is NOT the authenticator, it is the SmartZone. This is called “Proxy Mode” and it is important since it means that the SmartZone will be the device acting as the authenticator and communicating with the AAA server, not the AP.
More notes about my diagram:
- This is super high level. During steps 1 through 5, there are building blocks of information added to each request and challenge as the devices feel each other out. That isn’t important for this topic so I didn’t include details.
- The authentication server in this example is also acting as the accounting server.
- Step 6 can either be an accept message or a reject message, here I showed it in green as an accept message so we can get to the problem.
- What becomes critical is the timing between the steps, so I will reference the step numbers going forward.
The problem is actually quite simple to understand. It will usually present as an error message on the device acting as the accounting server as a missing or NULL IP address for a report. For most administrators, knowing the IP address of the device is a critical piece of information, and if the accounting server doesn’t have that piece of information, the server and the administrator can get a little irritated.
After lots and lots of digging, the why of the problem becomes pretty simple, and it has to do with the timing and efficiency of the network devices. Yes, in this case, the network devices acting as the authenticator is working too well. I don’t get to say that too often so I wanted to make sure I could say it here.
At issue is the timing for the exchange above. If we reset the timers for everything so the initial access request from the client device is set to time stamp 00:00:00.000 (hrs:min:sec.thousandths), the rest of the steps fall into this timing sequence:
- 00:00:00.000 –> Receive access request from the device, asking to connect to the WLAN
- 00:00:00.366 –> Access challenge sent from the AAA server to the device
- 00:00:00.515 –> Receive access request from the device with additional EAP information
- 00:00:00.761 –> Access challenge sent from the AAA server to the device with EAP information
- 00:00:00.987 –> Receive access request from the device with more EAP information
- 00:00:01.359 –> Receive access approval from the AAA and forwarded to the AP and the device
- 00:00:01.501 –> Accounting start message sent to the accounting server
The key aspect to remember here is that during steps 1 through 6, the device in question is kept in a controlled state, meaning that it can only send and receive RADIUS packets. Other types of packets are suppressed, waiting for the decision that comes in step 6. Sure, it’s good to know that an AP on the east coast is talking to a SmartZone on the west coast and then to the AAA that is located somewhere else in the world, and those 6 steps take a total of 1.359 seconds, but our problem exists between steps 6 and 7.
By following the amazing explanation of the whole EAP exchange as told by Eddie Forero (you can find that here) you will realize that after the access approval message in step 6, there are still some additional frames/packets that need to happen before the device can start to send data. First is the 4-Way handshake to build the encryption, and then after that is the DHCP process for the device to obtain an IP address, also another 4 frames/packets.
This means that before our accounting report can send the accounting start message that contains the device IP address, there are still an additional 8 frames that need to happen. The problem, that hopefully you are realizing on your own, is the timing between steps 6 and 7 is a mere 0.142 seconds between the AP being told that the device is allowed to connect and then a report being sent. A report we really wish contained all the information we want it to.
On a good day on a network with minimum traffic and devices, the 4-Way handshake takes 0.004 seconds to complete (no problem) and the DHCP process can average about 2.058 seconds (problem). Simple math tells us that those two processes (encryption and DHCP) combined take around 2.062 seconds to complete, but our accounting start report is sent at 0.142 seconds, a full 1.920 seconds before we have the information we really want to send to our accounting server.
What we need is a way to delay that accounting message until we have that IP address.
I wish I could take all the credit for the solution, but alas I can’t take any of the credit. That credit is due to the Ruckus Engineering team. I just get to bring that solution to you. When the setup and the problem present as outlined above, the solution is to add a single line to the WLAN configuration that basically tells the WLAN to wait for the IP address BEFORE sending that accounting start message. No need to mess with timers so if the network is busy or whatever, it just waits until we get that piece of information and THEN sends the message.
The command needed is this:
From the SmartZone CLI, this is how it works. First, make sure you are in privleieged mode by entering the command “en“. Then, enter into configuration mode using the command “config“. From there, state the zone for the change (in my case the zone name is “Demo”) and then the WLAN name you want to change (in my case the WLAN name is “Dot1XDemo). You will need to substitute the zone and WLAN name for your specific network. After that is the command from above, followed by “end“. Don’t forget the end part, that will cost you another 6 hours of time if you are anything like me.
All combined, it should look very similar to this, just with your names:
Now, if you were like me, you see the “no” and immediately think “I don’t need that because that removes the command.” And, like me, you would be wrong and spend 48 hours wondering what engineering had screwed up. The catch is the ignore at the end of the command. Without the “no” the WLAN is being told “you can ignore the ‘acct-ip-attr’ part” but what you want is the WLAN to not ignore but instead wait for that IP part.
Don’t ask me why it’s like this, I just know it is. I can also tell you this command works on both SmartZone 5.X and 6.X.
After spending many hours banging your head on the desk, the wall, the door, and then mistyping the zone name, and then not entering the “no” part, and then not entering the end command to update the context, you will be able to learn from my mistake and follow the example solution above. I share my pain with you so you can simply spend 10 minutes reading, enter the commands per my example, and then you get something that looks like this:
For my eagle-eyed reader hopefully, you noticed a new line between the “Acct-Session-Id” line and the “Acct-Multi-Session-Id” line. The “Framed-IP-Address” line is in fact the device’s IP address. Once this is included in the accounting start message, those pesky alerts and notifications from the accounting server complaining about not having an IP address in the messages goes away.
Hopefully, you can learn from my pain and the next time this topic comes up, we can all search for the solution and come across my blog and save us some time and damaged brain cells!