[alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix memory allocation

Pierre-Louis Bossart pierre-louis.bossart at linux.intel.com
Mon Jun 3 15:45:26 CEST 2019


Hi Libin,

>>>>> Please let me describe the issue here.
>>>>>
>>>>> The test case is:
>>>>> 1) Unload module with script "sudo ./sof_remove.sh" ,
>>>>> 2) reload module with script "sudo ./sof_insert.sh"
>>>>>
>>>>> After several rounds of removing and inserting kernel modules,
>>>>> system will complain like below:
>>>>> "BUG: unable to handle kernel paging request at 000000292a282031"
>>>>
>>>> Did you try some kernel debug options?  It might show what went wrong.
>>>
>>> No, I haven't. I'm not sure which options I can use for this case.
>>> Could you please give me some suggestions?
>>
>> You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option for
>> showing each devres allocation and removal.  And I'd try
>> CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever
>> interesting in CONFIG_DEBUG section, too.
> 
> Thanks for your suggestion. After more than 1 week debug, I think maybe
> I have root caused this issue from the devres.log message.
> 
> Below is my finding.
> 1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called,
> and it will set hdev->dev.release = default_release.
> However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe()
> will be called later. And it will call snd_hda_codec_device_new(), which will
> reset codec->code.dev.release = snd_hda_codec_dev_release;
> This means hdac_hdmi's hdev dev release is default_release() defined in
> hdac_ext_bus.c, and other hda codec's hdev dev release is
> snd_hda_codec_dev_release().
> 
> Both default_release() and snd_hda_codec_dev_release() will call kfree()
> to free the hdac_device (or its container) in the current code.
> 
> 2. When we run rmmod sof_pci_dev, it will free the sdev. If we use
> Struct hdac_device *hdev = devm_kzalloc(sdev->dev...). This means
> hdev will also be freed automatically.
> 
> In the removal, snd_hdac_ext_bus_device_remove() will be called
> to remove the hdev (in this function it is struct hdac_device *codec.
> The name is not aligned in different places).
> However for hdac_hdmi, the hdev->dev is used by other code.

what other code? Can you elaborate on why the release is delayed?

> So calling device.release() (the function default_release()) will
> be postponed. After after sdev is freed, the device.release() will
> be called. But for devm_xxx, hdev will also be freed when sdev is
> freed. This means hdev.dev after sdev is freed is invalid now as
> hdev has already freed. It will access invalid memory. This will cause the bug.

This is very hard to follow. 4 lines above you wrote the release is 
postponed but the way you describe is looks completely sequential.

> 
> So I think we should not use devm_xxx, and let's free the hdev manually.
> 
> At the end of this topic, I still found 2 suspicious code in the current code.
> 1. in sound/soc/intel/skylate/skl.c
> it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc().
> As we will call kfree() in the current code, should we replace it with
> kzalloc()? Maybe we need cavs drivers owner's help on it.

maybe you should send a diff suggestion to help everyone understand the 
changes you are referring to?

> 
> 2. in snd_hdac_ext_bus_device_remove()
> It will call snd_hdac_device_unregister() to unregister the hdac_devices
> and put_device(&codec->dev) to release the dev.
> For analog codec, snd_hdac_device_unregister()  will free the codec->dev's
> kobject. And snd_hda_codec_dev_release() will be called to free the
> hdac_device.
> So it is invalid to call put_device(&codec->dev). If you print
> refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before
> put_device(), you will find the refcount has already been 0.

Isn't it a different problem though? Does this cause a freeze or is this 
just a bad refcount?



More information about the Alsa-devel mailing list