Unifi Install and New Disconnected Issue
We have found a new issue during the installation and upgrade at a facility in North America. We have experienced a new “disconnected” challenge. We will walk you through the findings and the fix that we applied, and hopefully, some of you will save some hours that we had lost.
A little about project scope
One of our clients requested an upgrade to their wireless network as well as troubleshooting some network switches. The previous Wifi system is TP-Link which had been doing a good job. The network switches in question are Netgear ProSAFE switches. A previous IT firm had recommended and provided these switches.
They hired us to solve through-put issues. The client has out-grown their WIFI Network, this increase in demand had been causing unneeded stress on their entire infrastructure. Additionally, they are experiencing trouble with three of their switches, which required a reboot to return to operation.
Installation
The Count
- 6 UAP PRO’s
- 3 UAW 24 Port PoE Switches
- New Fiber
- 300 Meters of New Shielded CAT7
Upstream Equipment
- VPN Router with Firewall
- L3 Switch for controlling multiple networks (PBX, MS Server, Linux DevOps Servers)
Notes on installation
- The controller is a Remote NS|FLDC Server
- Unifi Protect is in the Pipeline
Process
All in all, the installation had gone smoothly; multiple dropped-ceiling installation points for the majority of APs with additional coverage in a warehouse environment. We used as much of the existing cabling as possible, and we installed only 300 meters of new cable. Also required, an IoT network and Guest Portal, which is pretty standard these days.
Device Information
Controller (Unifi) Version: 6.1.71
Access Point - UAP (UAP-AC-Pro) Firmware Version: 4.3.28.11361
Switch - USW (USW-Pro-24-PoE) Firmware Version: 5.43.35.12698
All was good; until it wasn’t
Once the equipment was installed, updated, provisioned, and programming, we started our testing. As with all of our projects, we stay on-site a minimum of 48 hours after installation and programming. This procedure gives us confirmation that all units are running smoothly with no issues.
We started to experience disconnect issues the following morning. So as we apply a simple hard restart on the affected units. When the APs were installed and connected to the controller, they all needed updates. In our experience, a reset after adoption & update can work magic. It will typically clear up unusual behavior.
We proceeded with a physical reset and finished with additional programming without further issues. The following morning the majority of our APs had entered an error state of Disconnected. It was now clear; we must investigate.
Results of an SSH Login
It was time to go to the terminal. It was possible to ping all of the disconnected UAPs, so we continued to login in via SSH. Running info command returned below information:
BusyBox v1.25.1 () built-in shell (ash)
___ ___ .__________.__
| | |____ |__\_ ____/__|
| | / \| || __) | | (c) 2010-2020
| | | | \ || \ | | Ubiquiti Networks, Inc.
|______|___| /__||__/ |__|
|_/ https://www.ui.com/
Welcome to UniFi UAP-AC-Pro-Gen2!
WestOffice-W02-BZ.v4.3.28# info
WestOffice-W02-BZ.v4.3.28#
A blank return! It took a few seconds to receive, but sure enough, every AP did the same thing. We started to review var/messages
and found some strange messages in the log.
So let's reset was the thought process. Start with a clean log and try to catch the action throughout the night. Ok, we will push the reset through CLI, and we continue with set-default
with these results:
WestOffice-W02-BZ.v4.3.28# set-inform
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
syswrapper: [state is locked] waiting for lock
Interesting, so what is this looped response syswrapper: [state is locked] waiting for lock
. What we could find in the messages is that the unit was running out of memory. So our process continues in this fashion.
How did we fix this
We noticed all our USW switches had been sitting on firmware version 5 of the Unifi firmware release, so we reviewed the release channel and found: UAP-USW-Firmware-5-43-36
The Ubiquiti Unifi Release specifically includes these USW Switches. The controller automatically updated the USWs to this version. Interestingly enough, also included in this release is the UAP-AC-PRO. Why didn't the controller automatically detect this and update it?
We felt comfortable enough to try this release as it was marked Official
. We also reviewed other locations, and these UAP's have had this update already.
Note:
This procedure can work well for remote locations. We had been on-site for this particular project, but our HQ is roughly 5000 miles (8046 km) away, So flying over to hit a reset button can be a bit complicated.
Step 1: Reboot
To move forward with this upgrade, we need to connect the UAP back into the controller, so a reboot is required. We performed this task via CLI as we do not have a connection to the controller. You could also pull the PoE cable resulting in the AP to power cycle. A graceful shutdown through CLI is the preferred approach, so we would only recommend this in the worst-case scenario.
WestOffice-WO2-BZ.v4.3.28# reboot
WestOffice-WO2-BZ.v4.3.28#
Device WestOffice-WO2: Connection Dropped
Step 2: Ping
So now our connection has been lost, and we would like to set-inform
as soon as possible.
Ping the AP continuously until you receive a response. Once you receive a response, SSH into the unit. You are looking for the response to change from "Destination Host Unreachable" to "Timed Response." It should look something like this below:
user-pc@hostname ~> ping 192.0.0.0
PING 192.0.0.0 (192.0.0.0) 56(84) bytes of data.
64 bytes from 192.0.0.0: icmp_seq=1 Destination Host Unreachable
64 bytes from 192.0.0.0: icmp_seq=2 Destination Host Unreachable
64 bytes from 192.0.0.0: icmp_seq=3 Destination Host Unreachable
64 bytes from 192.0.0.0: icmp_seq=4 Destination Host Unreachable
64 bytes from 192.0.0.0: icmp_seq=5 Destination Host Unreachable
64 bytes from 192.0.0.0: icmp_seq=6 Destination Host Unreachable
64 bytes from 192.0.0.0: icmp_seq=7 Destination Host Unreachable
64 bytes from 192.0.0.0: icmp_seq=8 Destination Host Unreachable
64 bytes from 192.0.0.0: icmp_seq=9 ttl=64 time=2.21 ms
64 bytes from 192.0.0.0: icmp_seq=10 ttl=64 time=2.88 ms
64 bytes from 192.0.0.0: icmp_seq=11 ttl=64 time=2.53 ms
64 bytes from 192.0.0.0: icmp_seq=12 ttl=64 time=1.84 ms
64 bytes from 192.0.0.0: icmp_seq=13 ttl=64 time=2.22 ms
Step 3: Command 'set-inform'
Now, log back in via SSH and tell the unit which controller it should connect to using the command set-inform
. If you are not familiar with this process, visit our article Unifi UAP | Set Inform URL via CLI
WestOffice-WO2-BZ.v4.3.28# set-inform http://FQDN:8080/inform
Adoption request sent to 'http://FQDN:8080/inform'. Use the controller to complete the adopt process.
Step 4: Response of 'Connected'
Your unit should reconnect to the controller. Confirm this by running the command info
via CLI, and it should return the AP info with the status of Connected
WestOffice-WO2-BZ.v4.3.28# info
Model: UAP-AC-PRO
Version: 4.3.28.11361
MAC Address: a1:b2:c3:d4:e5:d6
IP Address: 192.0.0.0
Hostname: WestOffice-WO2
Uptime: 120 seconds
Status: Connected (http://FQDN:8080/inform)
WestOffice-WO2-BZ.v4.3.28#
NOTE:
The information in this CLI output has been modified for security, and your results will be different.
Step 5: Reset First
Some might say this step isn't necessary, and it could very well be, but we have had nothing but success using this exact process. We will upgrade the version after a reset. We like this method because we know we have a "clean" device adopted into our controller - I guess the mindset is less mess - less stress.
To reset the AP run the command set-default
WestOffice-WO2-BZ.v4.3.28# set-default
Clearing CFG ... [%100] done!
Device WestOffice-WO2: Connection Dropped
NOTE:
Remember to remove the UAP from your controller; if you skip this step, you will not have the option to re-adopt. You can accomplish this by navigating to your controller -> Devices -> Select your Device -> Click Tab "Device" -> Click "Forget Device"
Step 6: Re-Adopt
Before we can apply the upgrade, we need to re-adopt this unit into the controller. You should see your device in the controller - "Ready to Add" (if the controller is onsite).
If the controller is remote, you can manually add the controller via set-inform
via CLI. Find the IP address of the AP by scanning your network. SSH into the AP with the default username and password (ubnt/ubnt), and set your inform address.
Your controller should see the device, and the status will read "Ready to Add". Select the Device and Click "Adopt"
Step 6: Apply Custom Upgrade
Once re-adoption is complete, it is time to move forward with the version upgrade. We will do this via the controller.
From the controller, navigate to the device -> click the tab "device" -> click "manage" and look for "Custom Upgrade" You will add the link for the upgrade.
We found this upgrade here: UAP-USW-Firmware-5-43-36
Please read the release before applying to any Unifi Device.
You can search for additional Releases and information at the main release page of the Unifi Community.
This can be found at: Unifi Community - Releases
You will need to copy the link of the release as seen below:
Paste this link in your custom upgrades section as shown below:
Then click "apply custom upgrade" via the controller menu.
The upgrade process will take a few minutes to complete. After completion, you can then continue to set up the device as you would any other.
Note:
We can run this upgrade procedure via CLI, but we are having connection issues with the controller. If a problem arises, this would almost guarantee the error to reveal itself during the process.
Our Result
We switched all of the affected APs using the above approach, and our nightly disconnections stopped. We are not 100% sure what caused these errors. It could be a known bug, and by the time you read this article, the solution via Ubiquiti could already be out. It was an issue we resolved with a newer update and wanted to share the results.
Limitations & Notes
This process does have its limitations:
- We cannot say that our fix will correct your unique situation; therefore, we cannot guaranty that this course of action will fix your disconnection issues.
- In our situation, our upgrade has moved the UAP-AC-PRO to a newer version. As a result, the controller continuously gives us an "Update Notice." This message is misleading; you are not upgrading, but more roll-back. See the image below - it is something you (or your administrator to be aware of)