On Mon, 18 Jan 2016 11:53:00 +0100, Dmitry Vyukov wrote:
On Fri, Jan 15, 2016 at 10:44 PM, Takashi Iwai tiwai@suse.de wrote:
On Fri, 15 Jan 2016 22:22:46 +0100, Takashi Iwai wrote:
On Fri, 15 Jan 2016 20:47:05 +0100, Dmitry Vyukov wrote:
On Fri, Jan 15, 2016 at 8:18 PM, Takashi Iwai tiwai@suse.de wrote:
On Fri, 15 Jan 2016 20:13:11 +0100, Dmitry Vyukov wrote:
On Fri, Jan 15, 2016 at 3:38 PM, Dmitry Vyukov dvyukov@google.com wrote: > On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai tiwai@suse.de wrote: >> On Fri, 15 Jan 2016 12:03:17 +0100, >> Dmitry Vyukov wrote: >>> >>> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai tiwai@suse.de wrote: >>> > On Fri, 15 Jan 2016 09:06:10 +0100, >>> > Dmitry Vyukov wrote: >>> >> >>> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai tiwai@suse.de wrote: >>> >> > On Wed, 13 Jan 2016 21:54:10 +0100, >>> >> > Takashi Iwai wrote: >>> >> >> >>> >> >> OK, then this might be a possible race at the current snd_timer_stop() >>> >> >> implementation. There is no sync action there, so the ISR might be >>> >> >> still alive after snd_timer_close() call. Or might be another race. >>> >> >> This pattern looks a bit different, as it's involved with hrtimer. >>> >> >> >>> >> >> I'll take a look at it tomorrow. >>> >> > >>> >> > I've audited the code today, but the open window doesn't look like >>> >> > what I expected. I found only some possible cases with slave timer >>> >> > instances. >>> >> > >>> >> > In anyway, below is a test fix patch. Since I couldn't reproduce the >>> >> > issue on my local machines, it's hard to say whether this covers the >>> >> > holes you fell. Let's see... >>> >> >>> >> >>> >> Hi Takashi, >>> >> >>> >> I would be interested to understand why other people can't reproduce >>> >> issues that I hit pretty reliably. >>> >> I suspect that it can be due to .config. Please try with the following >>> >> config values. >>> > >>> > I guess rather other config, e.g. the kernel debug options. >>> > I suppose you enabled KASAN and DEBUG_LIST. What else? >>> >>> I've attached my config (you will need to disable CONFIG_KCOV, it is >>> not upstreamed). >> >> Hm, that has lots of other drivers built-in... >> >>> >> I also start qemu with "-soundhw all" arg. >>> > >>> > OK, so you're testing with VM? This makes easier to recheck. >>> >>> Yes, I start qemu as: >>> >>> qemu-system-x86_64 -hda wheezy.img -net >>> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel >>> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug >>> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa >>> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp >>> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet >>> -soundhw all >> >> And which test did trigger use-after-free, even with all previous >> patches? > > I will try to extract a new reproducer now.
Ok, I does not seem to see any crashes except the timer hangs below. Let's consider all other bugs as fixed. I will report anything new that I see separately.
OK, good to hear.
> Meanwhile, can you try to reproduce this one: > https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ > ? I run the program in a tight parallel loop.
I could reproduce this after your suggestion with parallel runs.
This seems specific to hrtimer. Possibly it's not about the snd-timer core itself. Could you check whether this doesn't happen when CONFIG_SND_HRTIMER isn't set?
Does not happen without CONFIG_SND_HRTIMER. Do you mean that this is hrtimer bug?
I guess rather it's a bug in snd-hrtimer driver. Will check it later.
The patch below *might* fix the issue. There was a deadlock problem and the current code has a weird workaround for it. I suspect it being the cause.
If this works, I'll happily apply it before submitting the next pull request for 4.5. If not, I'll take a closer look at it in the next week :)
No, unfortunately the hang still happens with the patch:
Thanks for testing. I think I understood the problem. We faced a similar issue and moved hrtimer_cancel() in the past. But this wasn't enough, as the start function may be called also in interrupt, too.
How about the one below instead?
Takashi
--- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: hrtimer: Fix stall by hrtimer_cancel()
hrtimer_cancel() waits for the completion from the callback, thus it must not be called inside the callback itself. This was already a problem, and the early commit [fcfdebe70759: ALSA: hrtimer - Fix lock-up] tried to address it.
However, the previous fix is still insufficient: it may still cause a lockup when the ALSA timer instance reprograms itself at its callback. Then it invokes the start function even in snd_timer_interrupt() that is called in hrtimer callback itself, results in a CPU stall. It's not a hypothetical problem, as actually triggered by syzkaller fuzzer.
This patch tries to fix the issue again. Now we call hrtimer_try_to_cancel() at both start and stop functions so that it won't fall into a deadlock, yet giving some chance to cancel the queue if the functions have been called outside the callback. The proper hrtimer_cancel() is called in anyway at closing, so this should be enough.
Reported-by: Dmitry Vyukov dvyukov@google.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/core/hrtimer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/core/hrtimer.c b/sound/core/hrtimer.c index f845ecf7e172..656d9a9032dc 100644 --- a/sound/core/hrtimer.c +++ b/sound/core/hrtimer.c @@ -90,7 +90,7 @@ static int snd_hrtimer_start(struct snd_timer *t) struct snd_hrtimer *stime = t->private_data;
atomic_set(&stime->running, 0); - hrtimer_cancel(&stime->hrt); + hrtimer_try_to_cancel(&stime->hrt); hrtimer_start(&stime->hrt, ns_to_ktime(t->sticks * resolution), HRTIMER_MODE_REL); atomic_set(&stime->running, 1); @@ -101,6 +101,7 @@ static int snd_hrtimer_stop(struct snd_timer *t) { struct snd_hrtimer *stime = t->private_data; atomic_set(&stime->running, 0); + hrtimer_try_to_cancel(&stime->hrt); return 0; }