[alsa-devel] Thread spinning in kernel snd_pcm_link()/snd_pcm_unlink()
Rob Duncan
rduncan at tesla.com
Tue Oct 2 18:55:34 CEST 2018
Hi Takashi,
Thanks for taking a look at this.
> I'm not sure whether that's the case. Do you mean that one thread
> gets stuck at pcm_release_private() which calls snd_pcm_unlink()?
> Or do you really use the PCM linkage?
We're not explicitly using the link/unlink APIs, so I think it must be
pcm_release_private().
I'll try out your suggestion over the next couple of days. In the
meantime we've avoided the issue by arranging for the realtime threads
to have the same priority (which I think they should have anyway).
Rob.
At 08:14 on Tue, Oct 02 2018, Takashi wrote:
> On Fri, 28 Sep 2018 18:23:24 +0200,
> Rob Duncan wrote:
>>
>> I'm trying to address a bug where we end up with a thread spinning and
>> consuming an entire cpu. The issue seems to be this code in
>> sound/core/pcm_native.c:
>>
>> /* Writer in rwsem may block readers even during its waiting in queue,
>> * and this may lead to a deadlock when the code path takes read sem
>> * twice (e.g. one in snd_pcm_action_nonatomic() and another in
>> * snd_pcm_stream_lock()). As a (suboptimal) workaround, let writer to
>> * spin until it gets the lock.
>> */
>> static inline void down_write_nonblock(struct rw_semaphore *lock)
>> {
>> while (!down_write_trylock(lock))
>> cond_resched();
>> }
>>
>> The original commit for this is 67ec1072b053c15564e6090ab30127895dc77a89
>>
>> What we're suspecting is that a normal thread (SCHED_OTHER) has a reader
>> lock and a real-time thread using SCHED_RR or SCHED_FIFO is trying to
>> take the writer lock. If both threads are pinned to the same CPU for
>> some reason then the reader thread will never get scheduled (because the
>> real-time writer thread is still runnable), and we will never make
>> progress.
>>
>> Does this sound right? What can we do to fix this?
>
> I'm not sure whether that's the case. Do you mean that one thread
> gets stuck at pcm_release_private() which calls snd_pcm_unlink()?
> Or do you really use the PCM linkage?
>
> In the former case, we may loosen it by optimizing like the patch
> below (totally untested). I guess it won't be a problem about racy
> access, but need double-checks afterward.
>
>
> thanks,
>
> Takashi
>
>
> --- a/sound/core/pcm_native.c
> +++ b/sound/core/pcm_native.c
> @@ -2369,7 +2369,8 @@ int snd_pcm_hw_constraints_complete(struct snd_pcm_substream *substream)
>
> static void pcm_release_private(struct snd_pcm_substream *substream)
> {
> - snd_pcm_unlink(substream);
> + if (snd_pcm_stream_linked(substream))
> + snd_pcm_unlink(substream);
> }
>
> void snd_pcm_release_substream(struct snd_pcm_substream *substream)
More information about the Alsa-devel
mailing list