On Thu, 26 Nov 2020 19:06:04 +0100, Ben Bell wrote:
I've got a Behringer WING digital mixer which is equipped with a USB interface supporting 48 in, 48 out at 44.1 or 48kHz. It's plugged into a USB3 interface and for the most part it seems to work well as a class-compliant audio interface, but I'm struggling to eliminate xruns and it's starting to feel either driver or hardware quirk related issue.
I've done all the usual things -- I'm running a PREEMPT_RT kernel, set up the realtime priorities of Jack and the relevant USB IRQ to no avail. I'm doing most of the debugging remotely over ssh with X shutdown and most other processes on the system stopped. There's nothing plugged into the computer beyond network, keyboard, mouse and the audio interface.
I'm currently testing on kernel 5.10.0-rc5 with preempt-rt patches, but I've previously seen the same things on 5.4-rt (Debian backports) and 5.8 (self-compiled).
The xruns are characterised by bursts of "retire_capture_urb" warnings in the kernel logs (and in particular the frame status for everything in those URBs is EXDEV, meaning I think that the frames haven't been consumed by the time it's retired). I've patched sound/driver/pcm.c to provide me with a bit more debugging information and it looks like usually it's almost always between 7-10 consecutive calls to retire_capture_urb that are affected.
The bursts of retire_capture_urb warnings seem roughly cyclic, with a cycle time that is dependent on the combination of the frames/period setting and the number of periods per buffer, though it doesn't appear to be a strictly linear thing: 512/2 (~170s cycle); 256/2 (~94s); 128/3 (~65s); 64/13 (~270s). I can't immediately see any definitive smoking gun in the urb or interrupt counts, but there is some grouping in the timings between the xruns. Taking 64 frames, 5 periods, it's usually 50s, but sometimes 60s:
21:47:57 50s since xrun 4 21:48:47 50s since xrun 5 21:49:46 59s since xrun 6 21:50:46 60s since xrun 7 21:51:36 50s since xrun 8 21:52:36 60s since xrun 9 21:53:36 60s since xrun 10 21:54:37 61s since xrun 11 21:55:27 50s since xrun 12 21:56:27 60s since xrun 13 21:57:27 60s since xrun 14 21:58:17 50s since xrun 15 21:59:16 59s since xrun 16 22:00:06 50s since xrun 17 22:00:56 50s since xrun 18 ...
If I count the URB numbers (sorry, I'm shaky on the terminology, but I'm counting each URB which is retired) I often (but not always) see the same number of URBs passing between xrun bursts. Currently at 64 frames/5 periods, it's often exactly 50042 or 49923 urbs between bursts.
The delay between starting Jack and encountering the first burst is not predictable -- it's not just a whole cycle -- but stopping jack for e.g 30s then starting again delays the next burst by the same amount. So it does seem related to something in the audio streaming rather than anything else going on on the system interfering. It's as if something is slowly slipping out of sync between the Wing and the kernel until they need to resync, but I don't really know enough about USB to have any deeper insight.
All the tests above happen to be at 44.1kHz with the Wing set to 48 in, 48 out, but changing to 48kHz or 2/2 IO doesn't seem to cure anything.
One more data point which may not be relevant: if I switch off or unplug the Wing while Jack is running, the system gradually locks up over a period of a few seconds. I can get something out of the console briefly afterwards but within about ten seconds it's completely unresponsive. I imagine that pulling out a USB device from a realtime thread isn't a kind thing to do, but I don't recall this happening with other interfaces.
Any help or insights into where I should be looking (I'm a newbie in kernel space) appreciated. If someone wants remote access to the box to investigate in realtime we could probably figure something out.
In general you should avoid 44.1kHz if you want a small period size for a realtime process on USB-audio. With 44.1kHz, the packet size can't be fixed in integer, and the ISO transfer requires variable packet sizes. OTOH, ALSA API requires the fixed period size, hence it'll lead to inconsistencies occasionally.
Takashi