Every day there is a #WIFIQ that is posed by Lee Badman (@wirednot) on Twitter. If you aren’t following him, you should. Recently, he posed a question talking about troubleshooting a “5 GHz TV client device” that drops it’s connection every night. Whether this is an actual problem or not isn’t the point, nor is it why I like the question as a whole. What I like is the exercise in troubleshooting an issue without actually having any access to the equipment, or being able to ask any additional questions. The whole thing reminded me of the best class that I have ever taken, and it wasn’t even a true “class” and it wasn’t even a network or IP based technology class. The class in question was the EDACS Master Technician Course.
EDACS stands for Enhanced Digital Access Communication System and is a trunking protocol used mainly in Public Safety Two-Way Radio systems, and it is OLD. Everything else really doesn’t matter in this case, just that it’s the basis for what I went through. Think of the whole experience as a CCIE lab type set up, but 5 days long and working in a team. Just like the CCIE lab, they had a working full blown system that they would introduce errors into and then supply your team with a complaint, and tell you to go solve it. On a side note, they also wouldn’t let you leave at lunch or dinner time if you hadn’t solved the problem. Their theory was in the real world you don’t leave until the problem is solved, why allow you to leave during the class unless you had solved the problem. Only “class” I have ever been to where one night we didn’t leave until 7:30 pm and you felt like it was closer to midnight.
I LOVED IT!
During these exercises, during which you had to complete 4 lab problems and 4 tabletop problems to complete the course, they had only a few rules and then some procedures that made you think about what you were actually doing. For example, if you wanted to move from one “site” to another you had to announce where you were going and who and what you were taking with you. If you forgot to mention you were taking a laptop, you were graded on driving across town and forgetting your tools. There were some other rules but the one that stuck with me was this:
You can only ask 5 questions during initial customer contact.
The theory behind this rule was after 5 questions on a trouble call, the customer gets frustrated and basically asks “Why am I paying you to fix my stuff if all you do is interrogate me and ask questions I don’t know the answer to anyways?” It’s a very valid point, and afterwards I started to pay attention to this rule in all my dealings. They were right, 5 questions is about the limit, if not closer to 3 or 4. As a professional, my customers don’t want to hear me spout off and ask grandiose questions to show how smart I am. Asking about DHCP leases, 4-way handshakes, and AAA servers means nothing to a customer, their stuff not working is all they care about. In Wi-Fi, this generally means can they watch videos about cats (or goats for one of my co-workers; Hi Vic!) or connect to a remote file server. Asking about DNS servers or trace routes isn’t productive since the customer doesn’t know what that means, how it not working affects their service, or what it does when it is working correctly.
Bottom line – The customer only cares that it works, not how it works. Maybe, eventually, after it’s fixed, you will have to go back and explain it, but when things are down they don’t want your “idle” chit-chat, they want you working to fix their problem.
Based on this week long exercise, I came to realize there are only a handful of questions that need to be asked at the beginning of the “customer” interaction, and they work for most of any technology based interactions. Here are my top questions, in order, but it can be tweaked to suite the condition.
- Has this ever worked? Many times, the customer may have heard that a solution / feature was coming and took that to mean it was done. Verify that there actually was a deployed solution that functioned correctly.
- If this has worked in the past, when was the last time you (the person you are talking to) knew it was working? If this person is merely relaying a problem from someone else, get to the source. Knowing when the last time a solution functioned correctly gives you the first key piece of information needed in nailing down a problem.
- When did you first notice things weren’t working correctly? This is the second piece of information needed to nail down a time of the failure.
- When this is working correctly, what did that look like? I learned this the hard way when I spent an extra couple of hours troubleshooting a feature that was never supposed to work, by design.
That’s it. Unless you are talking to another technical person, a normal customer won’t be able to answer much else more than that. Nor do they care. “Yesterday I could stream music, today I can’t.”
The basis of the my couple of questions is to determine if this is an actual new problem or something that never worked from implementation. The other questions help to narrow down a time frame. There is nothing like finding out that sometime in the past 2 months it stopped working but we have no idea when. Go ahead and track down 2 months of syslog / traps / whatever else someone else decided was a good way to log events and then find the time “something” went wrong. By knowing when the last time things worked, and then knowing when they first noticed that things were wrong, that sets the time frame to search for the cause of the incident.
The last question is a trick question, and will be used in 2 different ways. First, you have just defined your first test. This allows you to determine if there might be a case of user error before you spend troubleshooting things that actually work. It also allows you to see the problem, if one exists, for yourself. Second, you have just determined what a successful test should look like (so you know when to stop trying to “fix” things that may or may not be an actual problem.) Turns out, defining what the customer expects things to look like when the problem is solved is just as important as collecting requirements before designing a new system. Both will form the basis of what a successful interaction will look like.
One more tidbit I picked up during this class was try to determine the “center” point of the system, and hopefully a place you can sample “traffic.” Whether that is a packet or an audio stream, the purpose is the same. By splitting the system in half, you can determine if it’s working half way. Find a place where you can watch the system and then use basic troubleshooting to troubleshoot the OSI stack. Just like with most complex issues, don’t forget that the basics still have to work.
Troubleshooting is an art, but an art that can be practiced. That’s why I like paying attention to Lee’s questions. I don’t always respond to his questions, but it makes me think how I would approach a problem. Like I learned in my class, even when it’s just an exercise on paper, learning to ask the correct question at the correct time can be either make your problem resolution a positive interaction or a frustrating journey involving many dead ends. Participate on Twitter, follow Lee, and watch for the question of the day. It’s so much easier than facing a customer without a plan, or a clue on what to do next.
Embrace the pain, honestly, it won’t kill you!