On Thu, 25 Feb 2021 23:19:31 +0100, Anton Yakovlev wrote:
On 25.02.2021 21:30, Takashi Iwai wrote:> On Thu, 25 Feb 2021 20:02:50 +0100,
Michael S. Tsirkin wrote:
On Thu, Feb 25, 2021 at 01:51:16PM +0100, Takashi Iwai wrote:
On Thu, 25 Feb 2021 13:14:37 +0100, Anton Yakovlev wrote:
[snip]
Takashi given I was in my tree for a while and I planned to merge it this merge window.
Hmm, that's too quick, I'm afraid. I see still a few rough edges in the code. e.g. the reset work should be canceled at the driver removal, but it's missing right now. And that'll become tricky because the reset work itself unbinds the device, hence it'll get stuck if calling cancel_work_sync() at remove callback.
Yes, you made a good point here! In this case, we need some external mutex for synchronization. This is just a rough idea, but maybe something like this might work:
struct reset_work { struct mutex mutex; struct work_struct work; struct virtio_snd *snd; bool resetting; };
static struct reset_work reset_works[SNDRV_CARDS];
init() // init mutexes and workers
virtsnd_probe() snd_card_new(snd->card) reset_works[snd->card->number].snd = snd;
virtsnd_remove() mutex_lock(reset_works[snd->card->number].mutex) reset_works[snd->card->number].snd = NULL; resetting = reset_works[snd->card->number].resetting; mutex_unlock(reset_works[snd->card->number].mutex)
if (!resetting) // cancel worker reset_works[snd->card->number].work // remove device
virtsnd_reset_fn(work) mutex_lock(work->mutex) if (!work->snd) // do nothing and take an exit path work->resetting = true; mutex_unlock(work->mutex)
device_reprobe() work->resetting = false;
interrupt_handler() schedule_work(reset_works[snd->card->number].work);
What do you think?
I think it's still somehow racy. Suppose that the reset_work is already running right before entering virtsnd_remove(): it sets reset_works[].resetting flag, virtsnd_remove() skips canceling, and both reset work and virtsnd_remove() perform at the very same time. (I don't know whether this may happen, but I assume it's possible.)
In that case, maybe a better check is to check current_work(), and perform cancel_work_sync() unless it's &reset_works[].work itself. Then the recursive cancel call can be avoided.
After that point, the reset must be completed, and we can (again) process the rest release procedure. (But also snd object itself might have been changed again, so it needs to be re-evaluated.)
One remaining concern is that the card number of the sound instance may change after reprobe. That is, we may want to another persistent object instead of accessing via an array index of sound card number. So, we might need reset_works[] associated with virtio_snd object instead.
In anyway, this is damn complex. I sincerely hope that we can avoid this kind of things. Wouldn't it be better to shift the reset stuff up to the virtio core layer? Or drop the feature in the first version. Shooting itself (and revival) is a dangerous magic spell, after all.
thanks,
Takashi