On Thu, 18 May 2023 11:00:54 +0200, Cezary Rojewski wrote:
On 2023-05-17 4:52 PM, Takashi Iwai wrote:
On Wed, 17 May 2023 15:15:13 +0200, Cezary Rojewski wrote:
...
After reading this conversation few times I came to conclusion that codec device should be kept up as long as its runtime_status=0 (RPM_ACTIVE), regardless if usage_count is 0 or not. Basically, in autosuspend case usage_count and runtime_status for the device are both 0 so, if we are not ignoring the former by calling pm_runtime_get_if_in_use() then we end up caching the writes during the autosuspend period - period when the device is no longer used but there is still some time left before it's suspended.
--- a/sound/hda/hdac_device.c +++ b/sound/hda/hdac_device.c @@ -611,7 +611,7 @@ EXPORT_SYMBOL_GPL(snd_hdac_power_up_pm); int snd_hdac_keep_power_up(struct hdac_device *codec) { if (!atomic_inc_not_zero(&codec->in_pm)) {
int ret = pm_runtime_get_if_in_use(&codec->dev);
int ret = pm_runtime_get_if_active(&codec->dev, true); if (!ret) return -1; if (ret < 0)
Results for the above look good.
OK, this seems to be a workaround.
I took a deeper look at the issue now, and noticed that it's a messy problem.
While we want to address this issue first - I've even messaged you about this for the very first time early 2021 :D but it's not marked as high prio so it remained unaddressed till now - me and Amadeo spent some time analyzing most of the pm-related code for sound/hda and we believe most of it could be replaced by native pm_runtime_xxx code and fields such as ->in_pm could be dropped.
It's a bit tricky. The in_pm refcount check is mandatory because the very same code path is called recursively for enabling itself.
However, this won't take a day or two and it's best if first we get to know the background what/why/when some of those specific functions/members exist in the hda code.
Yeah, I'd love to clean up / fix the stuff, but this is in some sensitive area, so let's get it carefully.
The check with pm_runtime_get_if_in_use() itself isn't wrong, but it needs the exceptional handling.
In addition to that, however, we need to work around the case where the register gets updated twice with -EAGAIN handling; at the first update, the regcache gets updated while the actual write gives an error. Then at the second update, it checks only the cache and believes as if it's been already written, and the write is skipped. This was what Amadeusz experienced with my previous patch. So, we need to paper over those two.
OTOH, if we replace the primary check with pm_runtime_get_if_active(true), this makes the things *mostly* working, too. This increases the usage count forcibly when the device is active, and we'll continue to write the register. The caveat is that the auto-suspend timer is still ticking in this case. But, this won't be a big problem, as the timer callback does check the state and cancel itself if the device is actually in use.
So, I guess we should go for pm_runtime_get_if_in_use(). But please give it more tests.
I believe you meant pm_runtime_get_if_active(true) in the last one. If yes, then indeed I'm delaying the patch upload until more tests are run.
Ah correct, sorry for the typo, I mean *_if_active().
Once again, thank you for the input. Ramping up and addressing this problem wouldn't happen so quickly without your guidance.
I'm going to submit your change as a proper fix patch soon later. Then let's dig down and tidy the rest things up.
thanks,
Takashi