On Oct 18, 2018, at 6:13 AM, Takashi Iwai tiwai@suse.de wrote:
On Fri, 12 Oct 2018 20:21:07 +0200, Brendan Shanks wrote:
I'm working on an embedded system based on an Ambarella H1 SoC (32-bit ARM Cortex A9). Audio playback through ALSA (and GStreamer) works fine when playing to the raw hw device. When playing through dshare (my normal configuration), playback stops after exactly 7 hours 16 minutes, for about 3 minutes. During the 3 minutes, the playback thread consumes an entire CPU core. After the 3 minutes, GStreamer reports an xrun, recovers from it, and playback goes back to normal.
After some debugging, the problem seems to be in snd_pcm_dshare_sync_area(). When dshare->appl_ptr rolls over to 0, 'size' becomes huge, 3036676576. 'slave_size' is also much bigger than it should be, 1258291680. This is what 'size' is set to when the for loop starts, and I believe the for loop then spends ~3 minutes copying a huge amount of samples. This also explains the 7h16m time, it's linked to the PCM boundary which is 1258291200. At 48 kHz, 1258291200 samples takes 7h16m54s.
I'm not sure what the fix should be though. Is this really a bug in dshare, or are bad values being set somewhere else? GStreamer? Or maybe the period/buffer size (480/9600) is causing problems?
Maybe some 32bit boundary overflow? I vaguely remember of it.
I think I figured out the issue, it is in snd_pcm_dshare_sync_area() as I suspected. When ‘slave_hw_ptr’ rolled over the ‘slave_boundary’, the wrong ‘slave_hw_ptr’ variable was being compared, resulting in ‘slave_size’ and ‘size’ being much too large. It was likely only triggered on 32-bit systems, since the PCM boundary is computed based on LONG_MAX and is much larger on 64-bit systems.
It looks like this same fix was made to pcm_dmix in commit 6c7f60f7a982fdba828e4530a9d7aa0aa2b704ae ("Fix boundary overlap”) from June 2005. I’ll send a patch to the list shortly.
Brendan