Warning due to "ALSA: hda: intel: More comprehensive PM runtime setup for controller driver"

newer
[PATCH] ALSA: intel-dsp-config:...

Heiner Kallweit

18 Nov 2021 18 Nov '21

9:33 p.m.

I get the following warning caused by 4f66a9ef37d3 ("ALSA: hda: intel: More comprehensive PM runtime setup for controller driver"):

snd_hda_intel 0000:00:1f.3: Unbalanced pm_runtime_enable!

Not sure how this patch was tested because the warning is obvious. The patch doesn't consider what the PCI sub-system does with regard to RPM. Have a look at pci_pm_init().

I'd understand to add the call to pm_runtime_dont_use_autosuspend(), but for all other added calls I see no justification.

If being unsure about when to use which RPM call best involve linux-pm@vger.kernel.org.

Show replies by date

Takashi Iwai

18 Nov 18 Nov

10:28 p.m.

On Thu, 18 Nov 2021 21:33:34 +0100, Heiner Kallweit wrote:

...

I get the following warning caused by 4f66a9ef37d3 ("ALSA: hda: intel: More comprehensive PM runtime setup for controller driver"):

snd_hda_intel 0000:00:1f.3: Unbalanced pm_runtime_enable!

Not sure how this patch was tested because the warning is obvious. The patch doesn't consider what the PCI sub-system does with regard to RPM. Have a look at pci_pm_init().

I'd understand to add the call to pm_runtime_dont_use_autosuspend(), but for all other added calls I see no justification.

If being unsure about when to use which RPM call best involve linux-pm@vger.kernel.org.

Thanks for the notice. It's been through Intel CI and tests on a few local machines, maybe we haven't checked carefully those errors but only concentrated on the other issues, as it seems.

There were two problems: one was the runtime PM being kicked off even during the PCI driver remove call, and another was the proper runtime PM setup after re-binding.

For avoiding the former, only the pm_runtime_forbid() (and maybe pm_runtime_dont_use_autosuspend(), too) would suffice? Also, for PCI device, no need for pm_runtime_set_supended() at remove, right?

thanks,

Takashi

Heiner Kallweit

11:13 p.m.

On 18.11.2021 22:28, Takashi Iwai wrote:

...

On Thu, 18 Nov 2021 21:33:34 +0100, Heiner Kallweit wrote:

...
I get the following warning caused by 4f66a9ef37d3 ("ALSA: hda: intel: More comprehensive PM runtime setup for controller driver"):

snd_hda_intel 0000:00:1f.3: Unbalanced pm_runtime_enable!

Not sure how this patch was tested because the warning is obvious. The patch doesn't consider what the PCI sub-system does with regard to RPM. Have a look at pci_pm_init().

I'd understand to add the call to pm_runtime_dont_use_autosuspend(), but for all other added calls I see no justification.

If being unsure about when to use which RPM call best involve linux-pm@vger.kernel.org.

Thanks for the notice. It's been through Intel CI and tests on a few local machines, maybe we haven't checked carefully those errors but only concentrated on the other issues, as it seems.

There were two problems: one was the runtime PM being kicked off even during the PCI driver remove call, and another was the proper runtime PM setup after re-binding.

Having a look at the commit message of "ALSA: hda: fix general protection fault in azx_runtime_idle" the following sounds weird:

- pci-driver.c:pm_runtime_put_sync() leads to a call to rpm_idle() which again calls azx_runtime_idle()

rpm_idle() is only called if usage_count is 1 when entering pm_runtime_put_sync. And this should not be the case. pm_runtime_get_sync() increments the usage counter before remove() is called, and remove() should also increment the usage counter. This doesn't seem to happen. Maybe for whatever reason pm_runtime_get_noresume() isn't called in azx_free(), or azx_free() isn't called from remove(). I think you should trace the call chain from the PCI core calling remove() to pm_runtime_get_noresume() getting called or not.

...

For avoiding the former, only the pm_runtime_forbid() (and maybe pm_runtime_dont_use_autosuspend(), too) would suffice? Also, for PCI device, no need for pm_runtime_set_supended() at remove, right?

thanks,

Takashi

Takashi Iwai

19 Nov 19 Nov

2:51 p.m.

On Thu, 18 Nov 2021 23:13:50 +0100, Heiner Kallweit wrote:

...

On 18.11.2021 22:28, Takashi Iwai wrote:

...
On Thu, 18 Nov 2021 21:33:34 +0100, Heiner Kallweit wrote:

...
I get the following warning caused by 4f66a9ef37d3 ("ALSA: hda: intel: More comprehensive PM runtime setup for controller driver"):

snd_hda_intel 0000:00:1f.3: Unbalanced pm_runtime_enable!

Not sure how this patch was tested because the warning is obvious. The patch doesn't consider what the PCI sub-system does with regard to RPM. Have a look at pci_pm_init().

I'd understand to add the call to pm_runtime_dont_use_autosuspend(), but for all other added calls I see no justification.

If being unsure about when to use which RPM call best involve linux-pm@vger.kernel.org.

Thanks for the notice. It's been through Intel CI and tests on a few local machines, maybe we haven't checked carefully those errors but only concentrated on the other issues, as it seems.

There were two problems: one was the runtime PM being kicked off even during the PCI driver remove call, and another was the proper runtime PM setup after re-binding.

Having a look at the commit message of "ALSA: hda: fix general protection fault in azx_runtime_idle" the following sounds weird:

pci-driver.c:pm_runtime_put_sync() leads to a call to rpm_idle() which again calls azx_runtime_idle()

rpm_idle() is only called if usage_count is 1 when entering pm_runtime_put_sync. And this should not be the case. pm_runtime_get_sync() increments the usage counter before remove() is called, and remove() should also increment the usage counter. This doesn't seem to happen. Maybe for whatever reason pm_runtime_get_noresume() isn't called in azx_free(), or azx_free() isn't called from remove(). I think you should trace the call chain from the PCI core calling remove() to pm_runtime_get_noresume() getting called or not.

Neither of them, supposedly. Now I took a deeper look at the code around it and dug into the git log, and found that the likely problem was the recent PCI core code refactoring (removal of pci->driver, etc) that have been already reverted; that was why linux-next-20211109 was broken and linux-next-20211110 worked. With the leftover pci->driver, the stale runtime PM callback was called at the pm_runtime_put_sync() call in pci_device_remove().

In anyway, I'll drop the invalid calls of pm_runtime_enable() / disable() & co. Maybe keeping pm_runtime_forbid() and pm_runtime_dont_use_autosuspend() at remove still makes some sense as a counter-part for the probe calls, though.

thanks,

Takashi

Kai Vehmanen

4:10 p.m.

Hi,

On Fri, 19 Nov 2021, Takashi Iwai wrote:

...

On Thu, 18 Nov 2021 23:13:50 +0100, Heiner Kallweit wrote:

...
On 18.11.2021 22:28, Takashi Iwai wrote:

...
...
snd_hda_intel 0000:00:1f.3: Unbalanced pm_runtime_enable!

Not sure how this patch was tested because the warning is obvious. The patch doesn't consider what the PCI sub-system does with regard to RPM. Have a look at pci_pm_init().

[...]

...

...
...
Thanks for the notice. It's been through Intel CI and tests on a few local machines, maybe we haven't checked carefully those errors but only concentrated on the other issues, as it seems.

ack on this. This did not go through our full CI, but rather I tested with the local setup I used to bisect this problem that was reported with linux-next. We specifically got the "Unbalanced pm_runtime" warning on the first iteration of the patch, but it was clean in the submitted version. But yeah, a wider test round would have caught this, sorry about that.

...

...
Having a look at the commit message of "ALSA: hda: fix general protection fault in azx_runtime_idle" the following sounds weird:

pci-driver.c:pm_runtime_put_sync() leads to a call to rpm_idle() which again calls azx_runtime_idle()

rpm_idle() is only called if usage_count is 1 when entering

[...]

...

...
This doesn't seem to happen. Maybe for whatever reason pm_runtime_get_noresume() isn't called in azx_free(), or azx_free() isn't called from remove(). I think you should trace the call chain from the PCI core calling remove() to pm_runtime_get_noresume() getting called or not.

Neither of them, supposedly. Now I took a deeper look at the code around it and dug into the git log, and found that the likely problem was the recent PCI core code refactoring (removal of pci->driver, etc) that have been already reverted; that was why linux-next-20211109 was broken and linux-next-20211110 worked. With the leftover pci->driver, the stale runtime PM callback was called at the pm_runtime_put_sync() call in pci_device_remove().

That probably explains it. I specifically saw runtime idle callback _after_ the PCI remove driver callback (azx_remove()) was run to completion. And this happened within execution of pci_device_remove(). But alas I could not hit this problem with post 20211110 linux-next.

...

In anyway, I'll drop the invalid calls of pm_runtime_enable() / disable() & co. Maybe keeping pm_runtime_forbid() and pm_runtime_dont_use_autosuspend() at remove still makes some sense as a counter-part for the probe calls, though.

Hmm, I was about to port this change to the SOF driver as well. But if the outcome is that driver can safely assume RPM callbacks are _not_ called after remove, then maybe we can keep the SOF implementation of remove as is.

Br, Kai

Heiner Kallweit

9:57 p.m.

On 19.11.2021 14:51, Takashi Iwai wrote:

...

On Thu, 18 Nov 2021 23:13:50 +0100, Heiner Kallweit wrote:

...
On 18.11.2021 22:28, Takashi Iwai wrote:

...
On Thu, 18 Nov 2021 21:33:34 +0100, Heiner Kallweit wrote:

...
I get the following warning caused by 4f66a9ef37d3 ("ALSA: hda: intel: More comprehensive PM runtime setup for controller driver"):

snd_hda_intel 0000:00:1f.3: Unbalanced pm_runtime_enable!

Not sure how this patch was tested because the warning is obvious. The patch doesn't consider what the PCI sub-system does with regard to RPM. Have a look at pci_pm_init().

I'd understand to add the call to pm_runtime_dont_use_autosuspend(), but for all other added calls I see no justification.

If being unsure about when to use which RPM call best involve linux-pm@vger.kernel.org.

Thanks for the notice. It's been through Intel CI and tests on a few local machines, maybe we haven't checked carefully those errors but only concentrated on the other issues, as it seems.

There were two problems: one was the runtime PM being kicked off even during the PCI driver remove call, and another was the proper runtime PM setup after re-binding.

Having a look at the commit message of "ALSA: hda: fix general protection fault in azx_runtime_idle" the following sounds weird:

pci-driver.c:pm_runtime_put_sync() leads to a call to rpm_idle() which again calls azx_runtime_idle()

rpm_idle() is only called if usage_count is 1 when entering pm_runtime_put_sync. And this should not be the case. pm_runtime_get_sync() increments the usage counter before remove() is called, and remove() should also increment the usage counter. This doesn't seem to happen. Maybe for whatever reason pm_runtime_get_noresume() isn't called in azx_free(), or azx_free() isn't called from remove(). I think you should trace the call chain from the PCI core calling remove() to pm_runtime_get_noresume() getting called or not.

Neither of them, supposedly. Now I took a deeper look at the code around it and dug into the git log, and found that the likely problem was the recent PCI core code refactoring (removal of pci->driver, etc) that have been already reverted; that was why linux-next-20211109 was broken and linux-next-20211110 worked. With the leftover pci->driver, the stale runtime PM callback was called at the pm_runtime_put_sync() call in pci_device_remove().

I also noticed that partially I was on the wrong path.

...

In anyway, I'll drop the invalid calls of pm_runtime_enable() / disable() & co. Maybe keeping pm_runtime_forbid() and pm_runtime_dont_use_autosuspend() at remove still makes some sense as a counter-part for the probe calls, though.

The call to pm_runtime_forbid() in pci_pm_init() is a heritage from early ACPI times when broken ACPI implementations had problems with RPM. There's a discussion (w/o result yet) to enable RPM per default for newer ACPI versions.

Calling pm_runtime_forbid() in the driver removal path isn't strictly needed but it doesn't hurt.

...

thanks,

Takashi

1331

Age (days ago)

1332

Last active (days ago)

List overview

Download

5 comments

3 participants

tags (0)

participants (3)

Heiner Kallweit
Kai Vehmanen
Takashi Iwai