Adding movable PCIe BARs support in snd_hda_intel

Takashi Iwai tiwai at suse.de
Tue Mar 23 17:50:26 CET 2021


On Tue, 23 Mar 2021 17:11:20 +0100,
Andrey Grodzovsky wrote:
> 
> 
> 
> On 2021-03-23 10:54 a.m., Takashi Iwai wrote:
> > On Tue, 23 Mar 2021 15:22:01 +0100,
> > Andrey Grodzovsky wrote:
> >>
> >>
> >>
> >> On 2021-03-23 7:39 a.m., Takashi Iwai wrote:
> >>> On Tue, 23 Mar 2021 12:23:16 +0100,
> >>> Andrey Grodzovsky wrote:
> >>>>
> >>>> Just an update, i found the issue which was actually to wake up the HW
> >>>> before doing stop/restart (using pm_runtime_get_sync), also handled
> >>>> protecting from concurrent snd_pcm_ioctls accessing the registers
> >>>> while the BAR is unmapped. Can go through BAR move while aplay is
> >>>> running now.
> >>>>
> >>>> Once again, would be happy for any comments on the code -
> >>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Flog%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C047e560ff3994f24656c08d8ee0b9497%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637521081583290130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Egfb346Qsxvdk33o1HhuBJihwbZRc8EZa9aJPzQ3Prk%3D&reserved=0
> >>>
> >>> Hrm, this kind of change is a bit scary; we have some mechanism to
> >>> stop the HD-audio hardware operation (e.g. for loading the DSP for
> >>> CA0132 codec), but it's pretty hackish and error-prone.
> >>
> >> At least I didn't have issue with HW stopping, the problem is with
> >> unampping hdac_bus->remap_addr - as long as it's not iorampped
> >> back no code besides the PCI BAR move should run at all as any register
> >> access from this address will cause fatal page fault. I took care
> >> of most of the IOCTls i think, the interrupts are disabled in azx_stop_chip
> >> but I am probably still missing some stuff like some background work
> >> items or other IOCTls code path.
> >
> > Are the all running processes stopped before entering rescan_prepare
> > callback?  Otherwise, e.g. a process accessing PCM stream via mmap can
> > run without ioctl and this may trigger the PCM period elapsed
> > eventually.  
> 
> Is any of them remaps the BAR into user process VMA ? The only two I see
> in the sound subsystem doing remapping are are had_pcm_mmap and
> snd_pcm_lib_default_mmap
> and both of them remap dma buffers which are in system memory and not
> MMIO space and  hence should not be impacted by the BAR move.

When a stream is running, it can cause an interrupt that may issue an
unsolicited event that is deals with CORB/RIRB later in a work
asynchronously.

> Or maybe I didn't get your concern correctly ? Why there would be a
> problem to access the stream otherwise during the BAR move ?
> 
> Similarly, a control interface may issue the verb that is
> > processed via CORB/RIRB, too.  So what you need to cover is not only
> > about PCM.
> 
> Can you point me to the code path for this - is this through another
> type of IOCTL or interrupt handlers ?

Yes, it's handled in the ALSA core control ioctls.  Many of the
control *_get()/*_put() callbacks in HD-audio are involved with the
amp or other HD-audio verbs (typically calling snd_hda_codec_amp*(),
etc).  Those are the mixer elements and handled via regmap with
HD-audio backend that deals with CORB/RIRB eventually.

> >>> Do we really need the BAR remap while operating the sound streaming?
> >>> That is, can't we simply refuse the remap if it's in operation?
> >>> If it can be refused, we may avoid a big locking or such.
> >>
> >> Problem here is that this feature is for the sake of hot-plug of a
> >> device, this supposed to give enough MMIO space to place all the BARs
> >> of newly plugging in device. Refusing to move BARs because there is some
> >> app using the sound card on the background might then lead to failure of
> >> plugging in the new device and this doesn't sound as a reasonable
> >> behavior to me. Also note that the lock's impact is minimal as it's
> >> read side lock only in the IOCTLs an so as long as hot plug not
> >> taking place it has no impact.
> >
> > But a hot-plugged device can't have any running files, no?
> > So at which timing is the BAR movement _must_ happen?
> 
> No, the devices this code is written for are those already in the
> system before the new device is plugged in, the new device needs enough
> free MMIO space for his own BARs and so this features enables existing
> devices to allow the PCI core to move their BARs within the  physical
> address space such that enough free space will be available inside
> the MMIO window allocated for the new device's upstream PCI bridge.

Hmm, OK, so it's other way round as I thought of.

> > And, even if we need some synchronization, the current lock
> > implementation looks fragile to me.  We likely overlook something when
> > trying to cover each branch code path.
> >
> > Basically it needs to wait until all running processes via PCM or
> > control API are released.
> 
> This approach brings a couple of hard questions -
> 
> How we make them stop at once - they might run for long time, you
> suggest something like invalidate all their mmaps and on next page
> fault return SIGBUS to terminate them ? Then I need also to know they
> actually are done - should I wait for their device file instance
> release? What if they don't access any mmapped are for a long time ?
> Sending SIGKILL explicitly ? But do I have the task id's of all of them ?

You shouldn't kill them but wait for them aborting gracefully, IMO.
Essentially a kind of hot-unplug.  Most of Linux drivers won't kill
processes at the hot-unplug, AFAIK.

But if it has to be done so, we have a list of file descriptors that
are attached to the PCM and the control devices, too...

> We had those scenarios discussed during my initial work on GPU hot
> unplug and came to a conclusssion that trying to wait for current
> processes to die is not a good approach - see the this respone I got
> at the time for example - 
> https://lists.freedesktop.org/archives/dri-devel/2020-May/265467.html

But should this be seen rather as an analogy of hot-replug?  That
won't kill processes.


Takashi

> 
> Andrey
> 
>   We have a mechanism to block the new opens
> > like snd_hda_lock_devices().  We'll need an additional stuff to wait
> > until all opened devices are released there, and this requires some
> > help in the core code, I suppose.
> > (Actually this hackish lock_devices/unlock_devices should have been in
> >   the core stuff from the beginning...)
> >
> >
> > thanks,
> >
> > Takashi
> >
> >>
> >> Andrey
> >>
> >>>
> >>>
> >>> thanks,
> >>>
> >>> Takashi
> >>>
> >>>>
> >>>>
> >>>> Andrey
> >>>>
> >>>> On 2021-03-19 5:22 p.m., Andrey Grodzovsky wrote:
> >>>>> Hi, I am working on adding AMD related drivers support for PCIe BARs
> >>>>> move feature developed by Sergei
> >>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinuxplumbersconf.org%2Fevent%2F7%2Fcontributions%2F847%2Fattachments%2F584%2F1035%2Flpc2020_sergmir.pdf&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C047e560ff3994f24656c08d8ee0b9497%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637521081583290130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7n2pk2fzcltX4UxdLnDlqZxIqtG2G6922XiDkJIVMeA%3D&reserved=0).
> >>>>>
> >>>>> His feature is still not upstream, all his code and mine on top can
> >>>>> be seen here -
> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Flog%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C047e560ff3994f24656c08d8ee0b9497%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637521081583290130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Egfb346Qsxvdk33o1HhuBJihwbZRc8EZa9aJPzQ3Prk%3D&reserved=0
> >>>>>
> >>>>> I did basic implementation fro amdgpu driver and now I am doing the
> >>>>> same for snd_hda_intel to support our on GPU Azalia audio
> >>>>> chips. This relevant commit is here -
> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1%26id%3D7ec0f60633e898cb941cebb3cd18aae1374fc365&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C047e560ff3994f24656c08d8ee0b9497%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637521081583290130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=o5Iv8aTSOyj3m32oA4Dp4k1y%2BZxFTStYMl%2F4ogS9OtA%3D&reserved=0
> >>>>>
> >>>>> Any driver that wants to support movable BARs needs to implement
> >>>>> rescan_prepare, rescan_done and bar_fixed callbacks where
> >>>>> rescan_prepare needs to stop HW/SW and unamp all MMIO mappings and
> >>>>> rescan_done needs to ioremap the BAR from it's new MMIO location and
> >>>>> restart HW/SW.
> >>>>>
> >>>>> I am able currently to trigger BARs move by sysfs using
> >>>>> "/sys/bus/pci/rescan" and the driver will go through the sequence I
> >>>>> described above without hangs. Problem is that after this when i try
> >>>>> to use mplayer I am getting following errors:
> >>>>>
> >>>>> andrey at andrey-test:~$ sudo mplayer -ao alsa:device=hw=0.3
> >>>>> Downloads/file_example_MP3_5MG.mp3
> >>>>> MPlayer 1.3.0 (Debian), built with gcc-9 (C) 2000-2016 MPlayer Team
> >>>>> do_connect: could not connect to socket
> >>>>> connect: No such file or directory
> >>>>> Failed to open LIRC support. You will not be able to use your remote
> >>>>> control.
> >>>>>
> >>>>> Playing Downloads/file_example_MP3_5MG.mp3.
> >>>>> libavformat version 58.29.100 (external)
> >>>>> Audio only file format detected.
> >>>>> Load subtitles in Downloads/
> >>>>> ==========================================================================
> >>>>> Opening audio decoder: [mpg123] MPEG 1.0/2.0/2.5 layers I, II, III
> >>>>> AUDIO: 44100 Hz, 2 ch, s16le, 320.0 kbit/22.68% (ratio: 40000->176400)
> >>>>> Selected audio codec: [mpg123] afm: mpg123 (MPEG 1.0/2.0/2.5 layers
> >>>>> I, II, III)
> >>>>> ==========================================================================
> >>>>> AO: [alsa] 44100Hz 2ch s16le (2 bytes per sample)
> >>>>> Video: no video
> >>>>> Starting playback...
> >>>>> A:   0.1 (00.0) of 132.0 (02:12.0) ??,?%
> >>>>> Audio device got stuck!
> >>>>>
> >>>>> and in dmesg I see
> >>>>> [  365.355518] snd_hda_codec_hdmi hdaudioC0D0: Unable to sync
> >>>>> register 0x2f0d00. -5
> >>>>>
> >>>>> Also I see 296.619014] snd_hda_intel 0000:0a:00.1: CORB reset
> >>>>> timeout#2, CORBRP = 65535 error during the rescan_done callback
> >>>>> execution
> >>>>>
> >>>>> I know it has to do with the move of BAR's MMIO address because when i
> >>>>> disallow BAR migration by returning true from bar_fixed callback I
> >>>>> have no such errors and mplayer works fine.
> >>>>>
> >>>>> I enabled MMIO trace and didn't see that post BAR move there is a
> >>>>> wrong MMIO access - all of them are from the new MMIO base offset -
> >>>>> 0xfcd80000 (trace attached including mmio trace and dmesg)
> >>>>>
> >>>>> I would be happy for any idea on this and any comment on the
> >>>>> correctness of my sequence in the code
> >>>>>
> >>>>> Andrey
> >>>>
> >>
> 


More information about the Alsa-devel mailing list