At Thu, 11 Oct 2012 18:41:52 +0200, Takashi Iwai wrote:
[Added Daniel and Clemens in the loop]
At Thu, 11 Oct 2012 17:17:59 +0200, Matthieu CASTET wrote:
Hi,
while doing some monkey tests on a product we found races in usb audio code when the device in unplugged from usb (on linus master tree).
This can be reproduced with usb_audio_show_race.diff and CONFIG_DEBUG_SLAB. With this patch, start a stream : # arecord -D hw:0 -r 44100 -c 2 -f S16_LE > /dev/null you will see the kernel log : "in snd_usb_hw_params sleeping" Unplug the device before "in snd_usb_hw_params sleeping exit", and you will see an oops in snd_pcm_hw_params
Instead of using CONFIG_DEBUG_SLAB, usb_audio_show_use_after_free.diff can show use after free by setting usb_device pointer to NULL in snd_usb_audio_disconnect. [1]
In order to protect from all the races, before using any usb_device we need to check if it is not freed.
What's the best way to do that ?
A trival fix would be to take a mutex in all snd_pcm_ops callback that access usb :
{ mutex_lock(&chip->shutdown_mutex); if (!chip->dev) { mutex_unlock(&chip->shutdown_mutex); return -ENODEV; } [...] mutex_unlock(&chip->shutdown_mutex); }
But that will serialize all snd_pcm_ops callbacks.
We had already some protections but they don't cover the recent code rewrites at all, so this bad result for now.
Yes, the easiest and maybe sane way would be to put the mutex in each ops except for trigger. open, close, hw_params, hw_free and prepare are ops that don't need much parallelism, so this can be protected well. Or, we may introduce a new mutex per stream if we really want parallel operations, but I don't think it's worth.
The trigger callback is an atomic ops, so it won't need the check. Maybe the spinlock is needed to avoid the race in snd_usb_stream_disconnect().
Another solution could be to use refcounting or rwlock patern :
- snd_pcm_ops callbacks call rdlock_trylock/rdlock_unlock
- usb_driver disconnect callback take rwlock_lock and never release it
I don't think this is needed.
So... the below is a quick hack I did without testing at all. Hopefully this can give some advance.
I checked this issue again actually with a real machine, and tried a bit better version. I'm going to send a series of fix patches.
Takashi