I merged the patches, since it looks like my branch is older I didn't have chip->init_failed member and also seems there was a typo on ur side as there is no flush_work_sync, only flush_work which waits synchronously anyway.
Everything works fine when testing with maplyer running concurrently to PCI rescan cycle.
Can you be more specific what are those get/put calls, I am thinking about some waitqueue to wait on in rescan_prepare after setting snd_hdac_bus_freeze, on wakeup it checks that a counter dropped back to zero. Not sure on which entity to hang this counter ?
Andrey
On 2021-03-24 11:06 a.m., Takashi Iwai wrote:
On Wed, 24 Mar 2021 15:53:33 +0100, Andrey Grodzovsky wrote:
Appreciate you investing the effort in helping on this. I will start to merge it now as it doesn't apply cleanly on my branch.
If I understand correctly your main HW access prevention mechanism during the PCI prepare-rescan period is by bailing out on IOCTLs with the check of power state == SNDRV_CTL_POWER_D0 or waiting when a user process closes it's device file descriptor in patches 2 and 5. For command submission prevention you use the freeze flag from patch 6. If I haven't missed anything I don't see how those all protect when new device is plugged while any of those operations are already in flight. What prevents concurrent HW access from an IOCTL already running and HW suspend and MMIO unampping in rescan_preapre which starts after IOCTL began ?
The call of snd_pcm_suspend_all() is the key. That puts all PCM streams in the stopped and suspended state. And the codec devices as well as the controller device will be put into the suspended state. So, for the PCM side, this should be fine with it, I guess.
However, after sending the patches, I noticed that they won't suffice for the pending control calls. For the control get/put callbacks that have been pending before setting the power_state=D3hot, they would still kick off the runtime PM and bad things may happen. Due to that, the bus.frozen flag won't work reliably, I'm afraid.
So, in that part, we need the code to sync the execution of the pending get/put calls in addition. Maybe refcounting the control ioctls that are involved with get/put calls (SNDRV_CTL_IOCTL_ELEM_READ, _WRITE, maybe with _TLV_READ, too) and wait for those ioctls finishing before actually starting the codec suspend procedure.
Takashi