[alsa-devel] [PATCH 0/2] PCM OSS fixes
Hi,
this contains two fixes for PCM OSS bugs that have been triggered by syzbot. These are simple and straightforward, and I'm going to queue for 4.15-rc8.
thanks,
Takashi
===
Takashi Iwai (2): ALSA: pcm: Abort properly at pending signal in OSS read/write loops ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
sound/core/oss/pcm_oss.c | 40 +++++++++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 13 deletions(-)
The loops for read and write in PCM OSS emulation have no proper check of pending signals, and they keep processing even after user tries to break. This results in a very long delay, often seen as RCU stall when a huge unprocessed bytes remain queued. The bug could be easily triggered by syzkaller.
As a simple workaround, this patch adds the proper check of pending signals and aborts the loop appropriately.
Reported-by: syzbot+993cb4cfcbbff3947c21@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/core/oss/pcm_oss.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c index ceaa51f76591..e317964bd2ea 100644 --- a/sound/core/oss/pcm_oss.c +++ b/sound/core/oss/pcm_oss.c @@ -1381,6 +1381,10 @@ static ssize_t snd_pcm_oss_write1(struct snd_pcm_substream *substream, const cha tmp != runtime->oss.period_bytes) break; } + if (signal_pending(current)) { + tmp = -ERESTARTSYS; + goto err; + } } mutex_unlock(&runtime->oss.params_lock); return xfer; @@ -1466,6 +1470,10 @@ static ssize_t snd_pcm_oss_read1(struct snd_pcm_substream *substream, char __use bytes -= tmp; xfer += tmp; } + if (signal_pending(current)) { + tmp = -ERESTARTSYS; + goto err; + } } mutex_unlock(&runtime->oss.params_lock); return xfer;
PCM OSS read/write loops keep taking the mutex lock for the whole read/write, and this might take very long when the exceptionally high amount of data is given. Also, since it invokes with mutex_lock(), the concurrent read/write becomes unbreakable.
This patch tries to address these issues by replacing mutex_lock() with mutex_lock_interruptible(), and also splits / re-takes the lock at each read/write period chunk, so that it can switch the context more finely if requested.
Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/core/oss/pcm_oss.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c index e317964bd2ea..5e334642903e 100644 --- a/sound/core/oss/pcm_oss.c +++ b/sound/core/oss/pcm_oss.c @@ -1334,8 +1334,11 @@ static ssize_t snd_pcm_oss_write1(struct snd_pcm_substream *substream, const cha
if ((tmp = snd_pcm_oss_make_ready(substream)) < 0) return tmp; - mutex_lock(&runtime->oss.params_lock); while (bytes > 0) { + if (mutex_lock_interruptible(&runtime->oss.params_lock)) { + tmp = -EINTR; + break; + } if (bytes < runtime->oss.period_bytes || runtime->oss.buffer_used > 0) { tmp = bytes; if (tmp + runtime->oss.buffer_used > runtime->oss.period_bytes) @@ -1379,18 +1382,18 @@ static ssize_t snd_pcm_oss_write1(struct snd_pcm_substream *substream, const cha xfer += tmp; if ((substream->f_flags & O_NONBLOCK) != 0 && tmp != runtime->oss.period_bytes) - break; + tmp = -EAGAIN; } + err: + mutex_unlock(&runtime->oss.params_lock); + if (tmp < 0) + break; if (signal_pending(current)) { tmp = -ERESTARTSYS; - goto err; + break; } + tmp = 0; } - mutex_unlock(&runtime->oss.params_lock); - return xfer; - - err: - mutex_unlock(&runtime->oss.params_lock); return xfer > 0 ? (snd_pcm_sframes_t)xfer : tmp; }
@@ -1438,8 +1441,11 @@ static ssize_t snd_pcm_oss_read1(struct snd_pcm_substream *substream, char __use
if ((tmp = snd_pcm_oss_make_ready(substream)) < 0) return tmp; - mutex_lock(&runtime->oss.params_lock); while (bytes > 0) { + if (mutex_lock_interruptible(&runtime->oss.params_lock)) { + tmp = -EINTR; + break; + } if (bytes < runtime->oss.period_bytes || runtime->oss.buffer_used > 0) { if (runtime->oss.buffer_used == 0) { tmp = snd_pcm_oss_read2(substream, runtime->oss.buffer, runtime->oss.period_bytes, 1); @@ -1470,16 +1476,16 @@ static ssize_t snd_pcm_oss_read1(struct snd_pcm_substream *substream, char __use bytes -= tmp; xfer += tmp; } + err: + mutex_unlock(&runtime->oss.params_lock); + if (tmp < 0) + break; if (signal_pending(current)) { tmp = -ERESTARTSYS; - goto err; + break; } + tmp = 0; } - mutex_unlock(&runtime->oss.params_lock); - return xfer; - - err: - mutex_unlock(&runtime->oss.params_lock); return xfer > 0 ? (snd_pcm_sframes_t)xfer : tmp; }
On 01/08/2018 03:25 PM, Takashi Iwai wrote:
@@ -1438,8 +1441,11 @@ static ssize_t snd_pcm_oss_read1(struct snd_pcm_substream *substream, char __use
if ((tmp = snd_pcm_oss_make_ready(substream)) < 0) return tmp;
- mutex_lock(&runtime->oss.params_lock); while (bytes > 0) {
if (mutex_lock_interruptible(&runtime->oss.params_lock)) {
tmp = -EINTR;
For consistency ERESTARTSYS I guess? In the end this is the same as signal_pending().
break;
if (bytes < runtime->oss.period_bytes || runtime->oss.buffer_used > 0) { if (runtime->oss.buffer_used == 0) { tmp = snd_pcm_oss_read2(substream, runtime->oss.buffer, runtime->oss.period_bytes, 1);}
@@ -1470,16 +1476,16 @@ static ssize_t snd_pcm_oss_read1(struct snd_pcm_substream *substream, char __use bytes -= tmp; xfer += tmp; }
- err:
mutex_unlock(&runtime->oss.params_lock);
if (tmp < 0)
if (signal_pending(current)) { tmp = -ERESTARTSYS;break;
goto err;
}break;
}tmp = 0;
- mutex_unlock(&runtime->oss.params_lock);
- return xfer;
- err:
- mutex_unlock(&runtime->oss.params_lock); return xfer > 0 ? (snd_pcm_sframes_t)xfer : tmp;
}
On Mon, 08 Jan 2018 16:35:49 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 03:25 PM, Takashi Iwai wrote:
@@ -1438,8 +1441,11 @@ static ssize_t snd_pcm_oss_read1(struct snd_pcm_substream *substream, char __use
if ((tmp = snd_pcm_oss_make_ready(substream)) < 0) return tmp;
- mutex_lock(&runtime->oss.params_lock); while (bytes > 0) {
if (mutex_lock_interruptible(&runtime->oss.params_lock)) {
tmp = -EINTR;
For consistency ERESTARTSYS I guess? In the end this is the same as signal_pending().
Right, it's a better one.
thanks,
Takashi
On Mon, 08 Jan 2018 16:36:47 +0100, Takashi Iwai wrote:
On Mon, 08 Jan 2018 16:35:49 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 03:25 PM, Takashi Iwai wrote:
@@ -1438,8 +1441,11 @@ static ssize_t snd_pcm_oss_read1(struct snd_pcm_substream *substream, char __use
if ((tmp = snd_pcm_oss_make_ready(substream)) < 0) return tmp;
- mutex_lock(&runtime->oss.params_lock); while (bytes > 0) {
if (mutex_lock_interruptible(&runtime->oss.params_lock)) {
tmp = -EINTR;
For consistency ERESTARTSYS I guess? In the end this is the same as signal_pending().
Right, it's a better one.
BTW, this spotted out another point still using -EINTR in pcm_oss.c. Will fix it as well. Thanks for reminding it.
Takashi
On 01/08/2018 03:25 PM, Takashi Iwai wrote:
Hi,
this contains two fixes for PCM OSS bugs that have been triggered by syzbot. These are simple and straightforward, and I'm going to queue for 4.15-rc8.
I tried to reproduce the issue as well with the reproducer generated by syzbot. But what syzbot reported was that the CPU didn't sleep for 140 seconds. But with a long write() we'll go to sleep in between frames. So when I tried the lockup detector never triggered.
Were you able to reproduce the same CPU lockup as syzbot?
- Lars
On Mon, 08 Jan 2018 15:58:47 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 03:25 PM, Takashi Iwai wrote:
Hi,
this contains two fixes for PCM OSS bugs that have been triggered by syzbot. These are simple and straightforward, and I'm going to queue for 4.15-rc8.
I tried to reproduce the issue as well with the reproducer generated by syzbot. But what syzbot reported was that the CPU didn't sleep for 140 seconds. But with a long write() we'll go to sleep in between frames. So when I tried the lockup detector never triggered.
Were you able to reproduce the same CPU lockup as syzbot?
Yes, I could reproduce RCU stalls with two reproducers (001a113ece5c2b600d05620b097a@google.com and 001a1147e1988b160a05620552eb@google.com)
The patches seem curing them, so far, in combination with other patches in for-linus branch of sound.git tree.
thanks,
Takashi
On 01/08/2018 04:05 PM, Takashi Iwai wrote:
On Mon, 08 Jan 2018 15:58:47 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 03:25 PM, Takashi Iwai wrote:
Hi,
this contains two fixes for PCM OSS bugs that have been triggered by syzbot. These are simple and straightforward, and I'm going to queue for 4.15-rc8.
I tried to reproduce the issue as well with the reproducer generated by syzbot. But what syzbot reported was that the CPU didn't sleep for 140 seconds. But with a long write() we'll go to sleep in between frames. So when I tried the lockup detector never triggered.
Were you able to reproduce the same CPU lockup as syzbot?
Yes, I could reproduce RCU stalls with two reproducers (001a113ece5c2b600d05620b097a@google.com and 001a1147e1988b160a05620552eb@google.com)
The patches seem curing them, so far, in combination with other patches in for-linus branch of sound.git tree.
Hm, interesting. Are you using aloop for the backend or real hardware?
On Mon, 08 Jan 2018 16:37:32 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 04:05 PM, Takashi Iwai wrote:
On Mon, 08 Jan 2018 15:58:47 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 03:25 PM, Takashi Iwai wrote:
Hi,
this contains two fixes for PCM OSS bugs that have been triggered by syzbot. These are simple and straightforward, and I'm going to queue for 4.15-rc8.
I tried to reproduce the issue as well with the reproducer generated by syzbot. But what syzbot reported was that the CPU didn't sleep for 140 seconds. But with a long write() we'll go to sleep in between frames. So when I tried the lockup detector never triggered.
Were you able to reproduce the same CPU lockup as syzbot?
Yes, I could reproduce RCU stalls with two reproducers (001a113ece5c2b600d05620b097a@google.com and 001a1147e1988b160a05620552eb@google.com)
The patches seem curing them, so far, in combination with other patches in for-linus branch of sound.git tree.
Hm, interesting. Are you using aloop for the backend or real hardware?
The RCU stall happened also with dummy driver. Just feeding a GB of chunk to /dev/audio and aborting the stream concurrently, then it stalls at that point.
Takashi
On 01/08/2018 04:40 PM, Takashi Iwai wrote:
On Mon, 08 Jan 2018 16:37:32 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 04:05 PM, Takashi Iwai wrote:
On Mon, 08 Jan 2018 15:58:47 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 03:25 PM, Takashi Iwai wrote:
Hi,
this contains two fixes for PCM OSS bugs that have been triggered by syzbot. These are simple and straightforward, and I'm going to queue for 4.15-rc8.
I tried to reproduce the issue as well with the reproducer generated by syzbot. But what syzbot reported was that the CPU didn't sleep for 140 seconds. But with a long write() we'll go to sleep in between frames. So when I tried the lockup detector never triggered.
Were you able to reproduce the same CPU lockup as syzbot?
Yes, I could reproduce RCU stalls with two reproducers (001a113ece5c2b600d05620b097a@google.com and 001a1147e1988b160a05620552eb@google.com)
The patches seem curing them, so far, in combination with other patches in for-linus branch of sound.git tree.
Hm, interesting. Are you using aloop for the backend or real hardware?
The RCU stall happened also with dummy driver. Just feeding a GB of chunk to /dev/audio and aborting the stream concurrently, then it stalls at that point.
Hm, does that mean that snd_pcm_playback_avail() is never 0 for the dummy driver? Usually we should catch the signal_pending() in wait_for_avail().
This might not be OSS specific and maybe it is possible to trigger it from ALSA as well.
On 01/08/2018 05:00 PM, Lars-Peter Clausen wrote:
On 01/08/2018 04:40 PM, Takashi Iwai wrote:
On Mon, 08 Jan 2018 16:37:32 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 04:05 PM, Takashi Iwai wrote:
On Mon, 08 Jan 2018 15:58:47 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 03:25 PM, Takashi Iwai wrote:
Hi,
this contains two fixes for PCM OSS bugs that have been triggered by syzbot. These are simple and straightforward, and I'm going to queue for 4.15-rc8.
I tried to reproduce the issue as well with the reproducer generated by syzbot. But what syzbot reported was that the CPU didn't sleep for 140 seconds. But with a long write() we'll go to sleep in between frames. So when I tried the lockup detector never triggered.
Were you able to reproduce the same CPU lockup as syzbot?
Yes, I could reproduce RCU stalls with two reproducers (001a113ece5c2b600d05620b097a@google.com and 001a1147e1988b160a05620552eb@google.com)
The patches seem curing them, so far, in combination with other patches in for-linus branch of sound.git tree.
Hm, interesting. Are you using aloop for the backend or real hardware?
The RCU stall happened also with dummy driver. Just feeding a GB of chunk to /dev/audio and aborting the stream concurrently, then it stalls at that point.
Hm, does that mean that snd_pcm_playback_avail() is never 0 for the dummy driver? Usually we should catch the signal_pending() in wait_for_avail().
Even with the dummy driver I'm not able to reproduce it. wait_for_avail() gets called, thread goes to sleep and a Ctrl+C wakes it up with signal_pending() set and it exits the syscall.
On Mon, 08 Jan 2018 20:21:00 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 05:00 PM, Lars-Peter Clausen wrote:
On 01/08/2018 04:40 PM, Takashi Iwai wrote:
On Mon, 08 Jan 2018 16:37:32 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 04:05 PM, Takashi Iwai wrote:
On Mon, 08 Jan 2018 15:58:47 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 03:25 PM, Takashi Iwai wrote: > Hi, > > this contains two fixes for PCM OSS bugs that have been triggered by > syzbot. These are simple and straightforward, and I'm going to queue > for 4.15-rc8.
I tried to reproduce the issue as well with the reproducer generated by syzbot. But what syzbot reported was that the CPU didn't sleep for 140 seconds. But with a long write() we'll go to sleep in between frames. So when I tried the lockup detector never triggered.
Were you able to reproduce the same CPU lockup as syzbot?
Yes, I could reproduce RCU stalls with two reproducers (001a113ece5c2b600d05620b097a@google.com and 001a1147e1988b160a05620552eb@google.com)
The patches seem curing them, so far, in combination with other patches in for-linus branch of sound.git tree.
Hm, interesting. Are you using aloop for the backend or real hardware?
The RCU stall happened also with dummy driver. Just feeding a GB of chunk to /dev/audio and aborting the stream concurrently, then it stalls at that point.
Hm, does that mean that snd_pcm_playback_avail() is never 0 for the dummy driver? Usually we should catch the signal_pending() in wait_for_avail().
Even with the dummy driver I'm not able to reproduce it. wait_for_avail() gets called, thread goes to sleep and a Ctrl+C wakes it up with signal_pending() set and it exits the syscall.
Is the machine too fast? But mine isn't that slow, either.
FWIW, below is the kconfig I'm using. I ran KVM with -smp 2.
Takashi
On Mon, 08 Jan 2018 17:00:08 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 04:40 PM, Takashi Iwai wrote:
On Mon, 08 Jan 2018 16:37:32 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 04:05 PM, Takashi Iwai wrote:
On Mon, 08 Jan 2018 15:58:47 +0100, Lars-Peter Clausen wrote:
On 01/08/2018 03:25 PM, Takashi Iwai wrote:
Hi,
this contains two fixes for PCM OSS bugs that have been triggered by syzbot. These are simple and straightforward, and I'm going to queue for 4.15-rc8.
I tried to reproduce the issue as well with the reproducer generated by syzbot. But what syzbot reported was that the CPU didn't sleep for 140 seconds. But with a long write() we'll go to sleep in between frames. So when I tried the lockup detector never triggered.
Were you able to reproduce the same CPU lockup as syzbot?
Yes, I could reproduce RCU stalls with two reproducers (001a113ece5c2b600d05620b097a@google.com and 001a1147e1988b160a05620552eb@google.com)
The patches seem curing them, so far, in combination with other patches in for-linus branch of sound.git tree.
Hm, interesting. Are you using aloop for the backend or real hardware?
The RCU stall happened also with dummy driver. Just feeding a GB of chunk to /dev/audio and aborting the stream concurrently, then it stalls at that point.
Hm, does that mean that snd_pcm_playback_avail() is never 0 for the dummy driver? Usually we should catch the signal_pending() in wait_for_avail().
This might not be OSS specific and maybe it is possible to trigger it from ALSA as well.
This seems specific to OSS. The problem is that OSS emulation has two nested loops in a mutex protected context. The outer loop is like:
while (bytes > 0) { __snd_pcm_lib_xfer(); if (error) break; } return xfer > 0 ? xfer : error;
and the inner loop (__snd_pcm_lib_xfer()) is like:
while (bytes > 0) { if (avail() > 0) write(); else if (wait_for_avail() < 0) break; } return xfer > 0 ? xfer : error;
Actually, wait_for_avail() does check signal_pending() and returns -ERESTARTSYS properly. This makes the inner loop aborted.
However, the problem is that OSS emulation sets avail_min=1: implying that, when there is any samples empty in the ring buffer, it can continue. In the case above, when some samples have been processed before wait_for_avail(), it keeps the outer loop spinning again.
Since each iteration takes ca 2ms on my VM, 8000Hz rate gives 1 or 2 samples free at each iteration. Thus at each time snd_pcm_lib_xfer() is invoked, it processes 1 or 2 samples, then returns due to pending signals, and the outer loop continues until the whole given samples are consumed. If the machine is faster or the sample rate is lower, the outer loop may exit, OTOH.
In ALSA-native API, there is no outer loop in the kernel context, and each read/write merely calling a single shot snd_pcm_lib_xfer(). So, when the signal is pending, it aborts properly.
Takashi
On 01/09/2018 08:33 AM, Takashi Iwai wrote: [...]
However, the problem is that OSS emulation sets avail_min=1: implying that, when there is any samples empty in the ring buffer, it can continue. In the case above, when some samples have been processed before wait_for_avail(), it keeps the outer loop spinning again.
Since each iteration takes ca 2ms on my VM, 8000Hz rate gives 1 or 2 samples free at each iteration. Thus at each time snd_pcm_lib_xfer() is invoked, it processes 1 or 2 samples, then returns due to pending signals, and the outer loop continues until the whole given samples are consumed. If the machine is faster or the sample rate is lower, the outer loop may exit, OTOH.
OK, that makes sense. I didn't think about that case. In my mind I was working with avail_min=period_size.
Thanks for clearing this up.
- Lars
participants (2)
-
Lars-Peter Clausen
-
Takashi Iwai