On Wed, 09 Aug 2023 23:11:45 +0200, Curtis Malainey wrote:
And now looking back at kobj code and device code, they do refcount parent objects. Maybe the problem is in our side -- the all devices are created with the original real device as the parent, including the card_dev, while there are some dependencies among children. So, if we build up a proper tree, pci_dev -> card_dev -> ctl_dev, pcm_dev, etc, one of the problems could be solved. It's more or less similar as what I suggested initially (referring card_dev at pcm), while changing the parent would make it implicitly.
Yes I think this would be the long term proper way to go, that way parents just put children and remove their reference, then they cleanup on their own time making everyone happy. My first patch was a very lazy attempt that, if we wanted to do the right thing we would obviously have to split the structs and free functions to operate in their own release. If you have time feel free to take another swing at the patches, otherwise I won't be able to start until next week.
Now looking back at the problem again, I noticed that actually my previous comment was wrong: as default, the device dependencies aren't kept at the release time, but it's already cleared at device_del() call. The device_del() calls kobject_del() and put_device(parent). So after this moment, both device releases become independent, and it'll hit a problem if the released object has still some dependency (such as the case of card vs ctl_dev in our case).
An extra dependency to card_dev as I put in my early patch would "fix" it. But, there is yet another problem: the call of dev_free call for snd_device object with SNDRV_DEV_LOWLEVEL can happen before releasing PCM and other devices when the delayed kobj release is enabled. And, usually this callback does release the top-level resources, which might be still accessed during the other releases.
So, if we tie the object resource with each struct device release, we have a lot of works: 1. Add extra dependencies among device hierarchy 2. Don't use card_dev refcount for managing the sync to device closes, introduce another kref instead; otherwise card_dev refcount would never reach to zero 3. Fix race of devres vs card_dev release 4. Move the second half part of snd_card_do_free() to the release callback of card_dev itself to sync with the top-level release 5. Rewrite all SNDRV_DEV_LOWLEVEL usages to be called via card->private_free or such; maybe the only problem is hda_intel.c and hda_tegra.c that need some work at the disconnection time, and we may introduce another hook in the card object to replace that
And, at this moment, I feel that it'd be easier to go back to the early way of device management, i.e. it'll be just like your patch, managing the device object independently from the rest resources. (This means also that the way freeing the resource for hwdep and rawmidi will go back again without the embedded device, too; they also suffer from the same problem of SNDRV_DEV_LOWLEVEL.)
The change 2 and 3 above can be still applied with your change, which will fix the remaining devres-vs-card_dev problem.
Once after fixing the current problem, we may work further on other stuff (e.g. item 5), so that we can switch again to the device-release model eventually later, too.
Takashi