[alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix memory allocation

Yang, Libin libin.yang at intel.com
Tue Jun 4 03:38:00 CEST 2019


Hi Pierre,

>Hi Libin,
>
>>>>>> Please let me describe the issue here.
>>>>>>
>>>>>> The test case is:
>>>>>> 1) Unload module with script "sudo ./sof_remove.sh" ,
>>>>>> 2) reload module with script "sudo ./sof_insert.sh"
>>>>>>
>>>>>> After several rounds of removing and inserting kernel modules,
>>>>>> system will complain like below:
>>>>>> "BUG: unable to handle kernel paging request at 000000292a282031"
>>>>>
>>>>> Did you try some kernel debug options?  It might show what went wrong.
>>>>
>>>> No, I haven't. I'm not sure which options I can use for this case.
>>>> Could you please give me some suggestions?
>>>
>>> You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option
>>> for showing each devres allocation and removal.  And I'd try
>>> CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever
>interesting
>>> in CONFIG_DEBUG section, too.
>>
>> Thanks for your suggestion. After more than 1 week debug, I think
>> maybe I have root caused this issue from the devres.log message.
>>
>> Below is my finding.
>> 1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be
>> called, and it will set hdev->dev.release = default_release.
>> However, for analog codec (not hdac_hdmi codec),
>> hdac_hda_codec_probe() will be called later. And it will call
>> snd_hda_codec_device_new(), which will reset codec->code.dev.release =
>> snd_hda_codec_dev_release; This means hdac_hdmi's hdev dev release is
>> default_release() defined in hdac_ext_bus.c, and other hda codec's
>> hdev dev release is snd_hda_codec_dev_release().
>>
>> Both default_release() and snd_hda_codec_dev_release() will call
>> kfree() to free the hdac_device (or its container) in the current code.
>>
>> 2. When we run rmmod sof_pci_dev, it will free the sdev. If we use
>> Struct hdac_device *hdev = devm_kzalloc(sdev->dev...). This means hdev
>> will also be freed automatically.
>>
>> In the removal, snd_hdac_ext_bus_device_remove() will be called to
>> remove the hdev (in this function it is struct hdac_device *codec.
>> The name is not aligned in different places).
>> However for hdac_hdmi, the hdev->dev is used by other code.
>
>what other code? Can you elaborate on why the release is delayed?

From the dmesg, it is the device_link used for hdac_hdmi that causes
the release delay. Device_link will increase the refcount.

>
>> So calling device.release() (the function default_release()) will be
>> postponed. After after sdev is freed, the device.release() will be
>> called. But for devm_xxx, hdev will also be freed when sdev is freed.
>> This means hdev.dev after sdev is freed is invalid now as hdev has
>> already freed. It will access invalid memory. This will cause the bug.
>
>This is very hard to follow. 4 lines above you wrote the release is postponed
>but the way you describe is looks completely sequential.

For hdac_hdmi, as it is used by other code (device_link), the device release
will not be called immediately. After sdev and hdev is freed, the hdev
device is put by device_link. Then hdev device can be released. But hdev
has already be freed. This is wrong.

>
>>
>> So I think we should not use devm_xxx, and let's free the hdev manually.
>>
>> At the end of this topic, I still found 2 suspicious code in the current code.
>> 1. in sound/soc/intel/skylate/skl.c
>> it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc().
>> As we will call kfree() in the current code, should we replace it with
>> kzalloc()? Maybe we need cavs drivers owner's help on it.
>
>maybe you should send a diff suggestion to help everyone understand the
>changes you are referring to?

Please see my inline patch:

diff --git a/sound/soc/intel/skylake/skl.c b/sound/soc/intel/skylake/skl.c
index f864f7b..12af99a 100644
--- a/sound/soc/intel/skylake/skl.c
+++ b/sound/soc/intel/skylake/skl.c
@@ -697,8 +697,7 @@ static int probe_codec(struct hdac_bus *bus, int addr)
        dev_dbg(bus->dev, "codec #%d probed OK: %x\n", addr, res);

 #if IS_ENABLED(CONFIG_SND_SOC_INTEL_SKYLAKE_HDAUDIO_CODEC)
-       hda_codec = devm_kzalloc(&skl->pci->dev, sizeof(*hda_codec),
-                                GFP_KERNEL);
+       hda_codec = kzalloc(sizeof(*hda_codec), GFP_KERNEL);
        if (!hda_codec)
                return -ENOMEM;

@@ -716,7 +715,7 @@ static int probe_codec(struct hdac_bus *bus, int addr)
        }
        return 0;
 #else
-       hdev = devm_kzalloc(&skl->pci->dev, sizeof(*hdev), GFP_KERNEL);
+       hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
        if (!hdev)
                return -ENOMEM;

>
>>
>> 2. in snd_hdac_ext_bus_device_remove() It will call
>> snd_hdac_device_unregister() to unregister the hdac_devices and
>> put_device(&codec->dev) to release the dev.
>> For analog codec, snd_hdac_device_unregister()  will free the
>> codec->dev's kobject. And snd_hda_codec_dev_release() will be called
>> to free the hdac_device.
>> So it is invalid to call put_device(&codec->dev). If you print
>> refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec
>> before put_device(), you will find the refcount has already been 0.
>
>Isn't it a different problem though? Does this cause a freeze or is this just a
>bad refcount?

Yes, it is a totally different problem. And today morning, I found Ranjani has
a patch " ASoC: hda: increment codec device refcount when it is 
added to the card ", which should fix this bug.

Regards,
Libin




More information about the Alsa-devel mailing list