[alsa-devel] [PATCH 0/2] PCM OSS fixes

Tue Jan 9 08:33:32 CET 2018

On Mon, 08 Jan 2018 17:00:08 +0100,
Lars-Peter Clausen wrote:
> 
> On 01/08/2018 04:40 PM, Takashi Iwai wrote:
> > On Mon, 08 Jan 2018 16:37:32 +0100,
> > Lars-Peter Clausen wrote:
> >>
> >> On 01/08/2018 04:05 PM, Takashi Iwai wrote:
> >>> On Mon, 08 Jan 2018 15:58:47 +0100,
> >>> Lars-Peter Clausen wrote:
> >>>>
> >>>> On 01/08/2018 03:25 PM, Takashi Iwai wrote:
> >>>>> Hi,
> >>>>>
> >>>>> this contains two fixes for PCM OSS bugs that have been triggered by
> >>>>> syzbot.  These are simple and straightforward, and I'm going to queue
> >>>>> for 4.15-rc8.
> >>>>
> >>>> I tried to reproduce the issue as well with the reproducer generated by
> >>>> syzbot. But what syzbot reported was that the CPU didn't sleep for 140
> >>>> seconds. But with a long write() we'll go to sleep in between frames. So
> >>>> when I tried the lockup detector never triggered.
> >>>>
> >>>> Were you able to reproduce the same CPU lockup as syzbot?
> >>>
> >>> Yes, I could reproduce RCU stalls with two reproducers
> >>> (001a113ece5c2b600d05620b097a at google.com and
> >>>  001a1147e1988b160a05620552eb at google.com)
> >>>
> >>> The patches seem curing them, so far, in combination with other
> >>> patches in for-linus branch of sound.git tree.
> >>>
> >>
> >> Hm, interesting. Are you using aloop for the backend or real hardware?
> > 
> > The RCU stall happened also with dummy driver.  Just feeding a GB of
> > chunk to /dev/audio and aborting the stream concurrently, then it
> > stalls at that point.
> 
> Hm, does that mean that snd_pcm_playback_avail() is never 0 for the dummy
> driver? Usually we should catch the signal_pending() in wait_for_avail().
> 
> This might not be OSS specific and maybe it is possible to trigger it from
> ALSA as well.

This seems specific to OSS.  The problem is that OSS emulation has two
nested loops in a mutex protected context.  The outer loop is like:

	while (bytes > 0) {
		__snd_pcm_lib_xfer();
		if (error)
			break;
	}
	return xfer > 0 ? xfer : error;

and the inner loop (__snd_pcm_lib_xfer()) is like:

	while (bytes > 0) {
		if (avail() > 0)
			write();
		else if (wait_for_avail() < 0)
			break;
	}
	return xfer > 0 ? xfer : error;

Actually, wait_for_avail() does check signal_pending() and returns
-ERESTARTSYS properly.  This makes the inner loop aborted.

However, the problem is that OSS emulation sets avail_min=1: implying
that, when there is any samples empty in the ring buffer, it can
continue.  In the case above, when some samples have been processed
before wait_for_avail(), it keeps the outer loop spinning again.

Since each iteration takes ca 2ms on my VM, 8000Hz rate gives 1 or 2
samples free at each iteration.  Thus at each time snd_pcm_lib_xfer()
is invoked, it processes 1 or 2 samples, then returns due to pending
signals, and the outer loop continues until the whole given samples
are consumed.  If the machine is faster or the sample rate is lower,
the outer loop may exit, OTOH.

In ALSA-native API, there is no outer loop in the kernel context, and
each read/write merely calling a single shot snd_pcm_lib_xfer().  So,
when the signal is pending, it aborts properly.

Takashi