-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
SkyConnect Crashes on v1.35, OK on v1.34 and earlier #20868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have the same issue. |
I have the issue with 1.35 too - though I don't see anything in the dmesg logs. I'm using the Sonoff dongle. I tried replacing the USB extension cable in case it was that but it made no difference. In my case it fails frequently, often multiple times in the day, and it's not picked up by the container watchdog - it just stops working reporting or updating devices. I've rolled back to 1.34 and everything has been stable since. |
@Nerivec is this something you could check? |
I don't think this is EZSP-related. Comparing Z2M versions 1.34.0...1.35.1, herdsman went from v0.25.0...v0.30.0. @droans That error -71 in the kernel logs seems to be related to USB suspend. Could something have changed in your system roughly at the same time you updated Z2M (or because you updated you triggered some other update, like the USB driver)? |
@Nerivec Negative. The coordinator only crashes on 1.35.0/1.35.1. I've been on 1.34 since with no issues but experience the crash within a few hours after upgrading to 1.35. |
I don't know what system you are using exactly, but you could have restored a pre-breaking-update state when you rolled back (as in, an update that happened alongside Z2M's that also got reversed when you went back to 1.34)? I can see this in the Dockerfile that changed and could affect the underlaying hardware, even though it's only a patch version... -FROM alpine:3.18.4 as base
+FROM alpine:3.18.5 as base https://git.alpinelinux.org/aports/log/?h=v3.18.5 Last resort... If you are comfortable/familiar with the swap, you could always try the EDGE version, it should fix quite a few EZSP issues in the init/reset logic. But since the versions you mentioned don't bring any difference to EZSP I doubt it would make... well, a difference... |
@droans are you using the z2m docker container or the z2m HA docker container (HA addon) |
Docker on Ubuntu 22.04. |
In case it's of use I'm using haos on a pi and was seeing the frequent
crashes until I rolled back to 1.34 :) No USB issues in dmesg on my side
though (and using the Sonoff dongle).
…On Sun, 28 Jan 2024, 20:01 droans, ***@***.***> wrote:
Docker on Ubuntu 22.04.
—
Reply to this email directly, view it on GitHub
<#20868 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPIBE7GRPHOC5B2FUPO5ALYQ2VCXAVCNFSM6AAAAABCAMJKEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTG4YDKOJYGY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@j4m3s Dongle E or Dongle P? I'm assuming E since the original post was about EZSP. I have the Dongle E on HAOS, but on an Intel NUC. I didn't get a single error (much less crash) since I updated to 1.35, been running ever since... It is plugged into a USB3.0 port with a decent cable length (though it used to sit right next to the NUC without issue either). What's the firmware version of your adapter? If you are on 6.x.x try going to 7.3.x. I've had it running for about three months now, no issue. 6 is getting old... 7.4.0 will get support in next release (only EDGE right now)! |
I've been running the SkyConnect 7.3.2.0 for months now unfortunately. I use a long USB extension cable with a USB 2.0 hub at the end of it. My Zigbee and Z-Wave dongles are plugged into short extension cables attached to that. I've built 1.35.1 locally with Alpine v3.18.4 since that seems to be the most likely issue... I'll see if it still crashes. |
Great! Let me know. |
@Nerivec yes Dongle-E and you're dead right, firmware version That firmware updater was reeeeeally easy. Wouldn't work on Linux unfortunately even through Chromium but worked like a dream in Windows. |
It's been about eight hours so far and no issues... Usually by now it would have crashed. I'll keep it running and update tomorrow. |
Unfortunately it crashed again this morning, surprisingly lasting about 18 hours. Interestingly, the dmesg errors and the Herdsman errors were slightly out of sync. The dmesg errors first start at 4:13:50. Herdsman reports a wait success at 4:13:45 followed by a watchdog wait at 4:13:55 and a failure two seconds later. My guess at this point is that it's possibly due to a dependency change or maybe even a device converter? I'm not confident on it being a dependency, though... I looked through the changes about a week ago and didn't see any that would affect the operation; they all seemed to be related to the linter or utility. I do have an "odd" network so maybe that? My old coordinator, a TI LAUNCHXL, is being used as a router so I suppose it could be causing issues. I also used to have some problems with my Ikea outlets dropping off with the SkyConnect but I haven't seen that occur in months. I also used the lsusb suggestion but it doesn't seem like I can get power reporting from it, though; the closest I see is max power which is always reporting 100mA. Would you have any suggestions on how I can help diagnose this further? I've got backups of my data so I'm not worried if there's a risk of destroying my data. I can also provide reasonable access to my instance if needed. I'll upload new logs in a couple hours. |
New logs... Oddly, Z2M worked perfectly fine from when I started yesterday until it restarted at 3:30 AM due to HA restarting. At that point, it worked for about 45 minutes until it went down. I couldn't find anything in related logs that seems relevant... No HA or MQTT issues, no unexpected Docker logs, etc. I rebuilt my container using zigbee-herdsman-converters 15.130.1. We'll see if that makes a difference. If not, my next steps would be to rebuild it with zigbee-herdsman 0.25. |
Thanks for the detailed report. Unfortunately, like you mentioned, I didn't find anything relevant in the logs... Possibly Koenkk/zigbee-herdsman#897 will help. Error 71 should be -EPROTO, but that's not very helpful... |
@Nerivec downgraded to 3.18.4 just in case |
I left that in the syslog intentionally in case you thought it could be relevant. It's from when Docker started the container up.
I actually do! Unfortunately, it doesn't seem to show any issues. None of my other hardware data show anything unusual either...
I installed it using the Nabu Casa online flasher. I can give that a try if my other attempts don't help.
It doesn't seem like I do.
Thanks - will try it out next. |
So there's no other error (possibly more descriptive) before
And the yaml inside is correct I assume? |
Unfortunately, no... The previous recent logs were just Docker handling networks and containers.
I assume so. I haven't made any changes to the config since I opened the issue. |
I ask because I had a case recently (after an update), where the config edited via UI was not properly saved to the yaml file (for some reason...); it created a completely messed up starting point for Z2M. But assuming your initial post was of the config "post-update", it seems okay... |
Well, good news... I rebuilt it with zigbee-herdsman 0.25 and it has been working fine for over 24 hours now. I'm going to try bumping forward the version until it crashes again. |
Thanks for the patience. 😉 Eliminating what was already tested, I see two changes from 0.25>0.30 worth mentioning, the typescript compiler version (went from ES2018 to ES2022, in 0.29), and the request queue cleanup (0.26). I'd go for these two points in time first; hopefully you'll have results faster. @slugzero Mind checking the below two points that changed (or anything else I might have missed, we're looking for anything that could result in over-use of memory, or flooding of the adapter -anything that could crash it basically, even a remote possibility-), since you did the PRs, you're familiar with the code there: https://github.com/Koenkk/zigbee-herdsman/blob/59c1bbe2d090403c0443f97d4fba3e644b1121f3/src/controller/model/endpoint.ts#L273 https://github.com/Koenkk/zigbee-herdsman/blob/59c1bbe2d090403c0443f97d4fba3e644b1121f3/src/controller/model/endpoint.ts#L304 |
@Nerivec I had a look at the logs but did not find anything that would hint to the request queue. In log.1.txt, there is a single message being queued (line 21546), but this is at 10:59, and the errors occur already at 9:14. What puzzles me are the 'No such file or directory' errors. Looks like the port was not properly closed or some other process is blocking it. What does |
Unfortunately I've already restarted the container by the time I think about checking. My current thought is that it's due to how Docker handles device mappings. Based on this post and a few others I've read, Docker won't reconnect devices after they are unplugged and plugged back in. Same goes for mapped volumes. I believe I can work around this by either using |
Thanks @slugzero! It starts with an I/O error first though, and from then, the port gets lock I figured. I pushed a few PRs (in latest release) to handle the init/reset logic for EZSP better (few code paths that didn't end properly), it should now be smoother, hopefully, but the weird thing is the old code would affect 0.25 just as much 0.30... Definitely try with today's new release though, if it fails again, at least the logs should be clearer. Long shot... you don't have anything that could change the default version of node installed by Alpine in your container, do you? It should still be v18. The serialport package was having issues not long ago with recent node versions (> 20.2)... |
I had a lot of problems starting zigbee2mqtt with my skyconnect (fw 7.3.2.0 build 212). |
@newlund Did you try rtscts again with latest release? I changed the settings a bit, it previously was using both software + hardware flow control, now if hardware is enabled (rtscts), it won't enable xon/xoff. |
Looks like a lot of improvements should at least improve logging... I've had 0.28.0 working for close to 24 hours now, too. I'm going to just switch to 1.35.2 and wait for it to give me errors. If the logs aren't helpful, I'll test out herdsman 0.29/0.30. |
Yes, it was after the latest release (1.35.2) I tried to disable rtscts. Had the problems with the previous release as well but didn't try it with that version. |
Now I enabled rtscts again and immediately when starting Z2M I got the error again:
Disabling rtscts makes it work again. |
Strange, I enabled zigbee-herdsman debug logging but it does not end up in ../config/zigbee2mqtt/log/ |
In the logs tab of Z2M. https://www.zigbee2mqtt.io/guide/usage/debug.html#zigbee-herdsman-debug-logging What firmware supplier are you using (Nabu Casa, darkxst, other...)? |
Sorry for the delay here - I've been busy this weekend. I've been running 1.35.2 and haven't seen a single issue yet. I'm guessing the RTS/CTS change might have fixed it? If it occurs again, I'll grab the logs. Thanks for your help! |
Thanks for the feedback! PS: @droans, if you don't mind, and have time, I might ping you from time to time if I find myself in need of data from a different setup (yours is very different from mine, and you have lots of data 😉). |
Here is the zigbee-herdsman debug log when I have rtscts enabled. If I disable rtscts it starts up normally without errors.
|
It's definitely receiving messed up frames. What firmware supplier are you using? |
I have tried different usb extension cables now and also tried a different usb port on my nuc. But still the same with rtscts enabled. I'm using 7.3.2.0 from here https://github.com/NabuCasa/silabs-firmware Could it be any usb setting in bios causing the problem? |
@newlund Sorry, missed your message somehow... There seems to be inconsistencies with the |
My issue with having rtscts enabled seems solved now in z2M v1.36.0 Thank you! :) |
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 30 days |
What happened?
With v1.35.0/1.35.1, I've been receiving the
sendZclFrameToEndpointInternal error
. Unlike similar issue reports, mine was not occurring due to my config pointing at the wrong device.However, based on the information in those posts, I set Z2M to collect the Zigbee-Herdsman debug logs. The first two files are the complete logs. The third file is a truncated log with the logs up until the devices are identified and the logs beginning right before the error occurred. Error messages begin on line 82227 in the second file.
log.part1.txt
log.part2.txt
truncated_log.txt
When this error occurs, Z2M will not work until I restart the service. The error showed up in my Z2M logs as below and would show for all my router devices:
When zigbee-herdsman debug logging is enabled, I was able to receive more information:
To get more information, I pulled the kernel logs. The attached log is filtered on the relevant lines. The logs below are filtered on lines which are more unique.
This is not an issue that occurs on v1.34.0 or earlier. Please let me know if you need any more information.
Additional info:
Configuration:
Docker-Compose:
What did you expect to happen?
The Zigbee adapter should not crash.
How to reproduce it (minimal and precise)
Use SkyConnect (and possibly other EZSP adapters?). Run v1.35.0 or v1.35.1 and wait. It can take between a few minutes and 24 hours to crash.
Zigbee2MQTT version
1.35.0
Adapter firmware version
7.3.2.0 build 212, Gecko SDK v4.3.2.0
Adapter
SkyConnect
Setup
Z2M Docker
Debug log
See above
The text was updated successfully, but these errors were encountered: