On Wed, 13 Nov 2019 08:24:41 +0100, Chih-Yang Hsia wrote:
On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai tiwai@suse.de wrote:
On Tue, 12 Nov 2019 18:17:13 +0100, paulhsia wrote:
Since
- snd_pcm_detach_substream sets runtime to null without stream lock and
- snd_pcm_period_elapsed checks the nullity of the runtime outside of stream lock.
This will trigger null memory access in snd_pcm_running() call in snd_pcm_period_elapsed.
Well, if a stream is detached, it means that the stream must have been already closed; i.e. it's already a clear bug in the driver that snd_pcm_period_elapsed() is called against such a stream.
Or am I missing other possible case?
thanks,
Takashi
In multithreaded environment, it is possible to have to access both `interrupt_handler` (from irq) and `substream close` (from snd_pcm_release) at the same time. Therefore, in driver implementation, if "substream close function" and the "code section where snd_pcm_period_elapsed() in" do not hold the same lock, then the following things can happen:
- interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
sustream pointer 2. snd_pcm_release_substream: call close without blocking 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set substream->runtime to NULL 4. interrupt_handler -> call snd_pcm_runtime() and crash while accessing fields in `substream->runtime`
e.g. In intel8x0.c driver for ac97 device, In driver intel8x0.c, `snd_pcm_period_elapsed` is called after checking `ichdev->substream` in `snd_intel8x0_update`. And if a `snd_pcm_release` call from alsa-lib and pass through close() and run to snd_pcm_detach_substream() in another thread, it's possible to trigger a crash. I can reproduce the issue within a multithread VM easily.
My patches are trying to provide a basic protection for this situation (and internal pcm lock between detach and elapsed), since
- the usage of `snd_pcm_period_elapsed` does not warn callers about
the possible race if the driver does not force the order for `calling snd_pcm_period_elapsed` and `close` by lock and
- lots of drivers already have this hidden issue and I can't fix them
one by one (You can check the "snd_pcm_period_elapsed usage" and the "close implementation" within all the drivers). The most common mistake is that
- Checking if the substream is null and call into snd_pcm_period_elapsed
- But `close` can happen anytime, pass without block and
snd_pcm_detach_substream will be trigger right after it
Thanks, point taken. While this argument is valid and it's good to harden the PCM core side, the concurrent calls are basically a bug, and we'd need another fix in anyway. Also, the patch 2 makes little sense; there can't be multiple close calls racing with each other. So I'll go for taking your fix but only the first patch.
Back to this race: the surfaced issue is, as you pointed out, the race between snd_pcm_period_elapsed() vs close call. However, the fundamental problem is the pending action after the PCM trigger-stop call. Since the PCM trigger doesn't block nor wait until the hardware actually stops the things, the driver may go to the other step even after this "supposed-to-be-stopped" point. In your case, it goes up to close, and crashes. If we had a sync-stop operation, the interrupt handler should have finished before moving to the close stage, hence such a race could be avoided.
It's been a long known problem, and some drivers have the own implementation for stop-sync. I think it's time to investigate and start implementing the fundamental solution.
thanks,
Takashi