Appreciate you investing the effort in helping on this. I will start to merge it now as it doesn't apply cleanly on my branch.
If I understand correctly your main HW access prevention mechanism during the PCI prepare-rescan period is by bailing out on IOCTLs with the check of power state == SNDRV_CTL_POWER_D0 or waiting when a user process closes it's device file descriptor in patches 2 and 5. For command submission prevention you use the freeze flag from patch 6. If I haven't missed anything I don't see how those all protect when new device is plugged while any of those operations are already in flight. What prevents concurrent HW access from an IOCTL already running and HW suspend and MMIO unampping in rescan_preapre which starts after IOCTL began ?
Andrey
On 2021-03-24 6:00 a.m., Takashi Iwai wrote:
On Tue, 23 Mar 2021 19:25:53 +0100, Andrey Grodzovsky wrote:
This will cover IOCTLs and any mmapped accesses i guess. Interrupts we discussed above. What above any possible background kernel work going on in dedicated threads or work items ? Any pointers there what should be blocked and waited for ?
An alternative idea would be the analogy of the system suspend / resume. That is, we forcibly suspend the devices at first somehow, and also restricts the further accesses by some way. Then do remap,
But that the point I guess, how you block further accesses without those big locks, during S3 i believe user mode gets suspended before the driver and so you don't need to worry about concurrent IOCTLs when going through suspend sequence
ALSA core still has some legacy card-level power management code, which was introduced many years ago at the time we still managed the power state via an extra ioctl (hence working individually from the base PM code), and a few pieces are still effective for this kind of purposes. Through a quick glance, a couple of places need band-aids, but the rest should work.
A bit more difficult problem is the floating control API calls. The get/put calls might be still in flight when we perform the PCI rescan. This has to be filtered out additionally.
Below are a patch series I cooked quickly. Totally untested, just checked the compilation. The first patch is a fix I'll merge in anyway, while the rest are RFC.
thanks,
Takashi