[RFC PATCH 0/3] ASoC: add dailink .exit() callback
While looking at reboot issues and module load/unload tests, I found out some resources allocated in the dailink .init() callback are not properly released - there is no existing mechanism in the soc-core to do so.
I experimented with different solutions and the simplest seems to add an .exit() callback. However things are not fully balanced and I could use feedback on the approach.
This patchset includes two examples where this solution is useful, but we have additional ones identified by Ranjani.
Pierre-Louis Bossart (3): ASoC: soc-core: introduce exit() callback for dailinks ASoC: Intel: bdw-rt5677: fix module load/unload issues ASoC: Intel: kbl-rt5660: use .exit() dailink callback to release gpiod
include/sound/soc.h | 3 +++ sound/soc/intel/boards/bdw-rt5677.c | 14 ++++++++++++-- sound/soc/intel/boards/kbl_rt5660.c | 13 +++++++++++-- sound/soc/soc-core.c | 8 +++++++- 4 files changed, 33 insertions(+), 5 deletions(-)
Some machine drivers allocate or request resources during the init() phase, which need to be released at some point, e.g. when rebooting or unloading modules.
In an initial pass, we added a .remove() callback for the platform driver, but that's not symmetrical at all and would be difficult to handle if there are more than one dailink implementing an .init().
We looked also into using .remove_dai_link() callback, but that would also be imlanced.
The suggested solution is to use a dual exit() phase for dailinks to release all resources.
The exit() is invoked in soc_free_pcm_runtime(), which is not completely symmetric with the init() invoked in soc_init_pcm_runtime() - not soc_add_pcm_runtime(), but that's the best solution so far.
Suggested-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- include/sound/soc.h | 3 +++ sound/soc/soc-core.c | 8 +++++++- 2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/include/sound/soc.h b/include/sound/soc.h index 81e5d17be935..2beebe89ebbc 100644 --- a/include/sound/soc.h +++ b/include/sound/soc.h @@ -794,6 +794,9 @@ struct snd_soc_dai_link { /* codec/machine specific init - e.g. add machine controls */ int (*init)(struct snd_soc_pcm_runtime *rtd);
+ /* codec/machine specific exit - dual of init() */ + void (*exit)(struct snd_soc_pcm_runtime *rtd); + /* optional hw_params re-writing for BE and FE sync */ int (*be_hw_params_fixup)(struct snd_soc_pcm_runtime *rtd, struct snd_pcm_hw_params *params); diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index f2cfbf182f49..09a0976d6a62 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -937,8 +937,14 @@ static int soc_dai_link_sanity_check(struct snd_soc_card *card, void snd_soc_remove_pcm_runtime(struct snd_soc_card *card, struct snd_soc_pcm_runtime *rtd) { + struct snd_soc_dai_link *dai_link = rtd->dai_link; + lockdep_assert_held(&client_mutex);
+ /* release machine specific resources */ + if (dai_link->exit) + dai_link->exit(rtd); + /* * Notify the machine driver for extra destruction */ @@ -1069,7 +1075,7 @@ static int soc_init_pcm_runtime(struct snd_soc_card *card, /* set default power off timeout */ rtd->pmdown_time = pmdown_time;
- /* do machine specific initialization */ + /* do machine specific allocations and initialization */ if (dai_link->init) { ret = dai_link->init(rtd); if (ret < 0) {
On Thu, Mar 05, 2020 at 07:06:14AM -0600, Pierre-Louis Bossart wrote:
The exit() is invoked in soc_free_pcm_runtime(), which is not completely symmetric with the init() invoked in soc_init_pcm_runtime()
- not soc_add_pcm_runtime(), but that's the best solution so far.
We *could* look at moving the init back. In any case this seems reasonable by itself (I'm less convinced by the users). However...
@@ -1069,7 +1075,7 @@ static int soc_init_pcm_runtime(struct snd_soc_card *card, /* set default power off timeout */ rtd->pmdown_time = pmdown_time;
- /* do machine specific initialization */
- /* do machine specific allocations and initialization */ if (dai_link->init) { ret = dai_link->init(rtd); if (ret < 0) {
...I'm not sure why we're saying to do allocations here? That really, really shouldn't be a normal thing - allocations should generally be done at the device model probe.
The use of devm_gpiod_get() in a dailink .init() callback generates issues when unloading modules. The dependencies between modules are not well handled and the snd_soc_rt5677 module cannot be removed:
rmmod: ERROR: Module snd_soc_rt5677 is in use
Removing the use of devm_ and manually releasing the gpio descriptor in the dailink .exit() callback solves the issue.
Tested on SAMUS Chromebook with SOF driver.
Suggested-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- sound/soc/intel/boards/bdw-rt5677.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/sound/soc/intel/boards/bdw-rt5677.c b/sound/soc/intel/boards/bdw-rt5677.c index a94f498388e1..641491cb449b 100644 --- a/sound/soc/intel/boards/bdw-rt5677.c +++ b/sound/soc/intel/boards/bdw-rt5677.c @@ -247,8 +247,8 @@ static int bdw_rt5677_init(struct snd_soc_pcm_runtime *rtd) RT5677_CLK_SEL_SYS2);
/* Request rt5677 GPIO for headphone amp control */ - bdw_rt5677->gpio_hp_en = devm_gpiod_get(component->dev, "headphone-enable", - GPIOD_OUT_LOW); + bdw_rt5677->gpio_hp_en = gpiod_get(component->dev, "headphone-enable", + GPIOD_OUT_LOW); if (IS_ERR(bdw_rt5677->gpio_hp_en)) { dev_err(component->dev, "Can't find HP_AMP_SHDN_L gpio\n"); return PTR_ERR(bdw_rt5677->gpio_hp_en); @@ -282,6 +282,15 @@ static int bdw_rt5677_init(struct snd_soc_pcm_runtime *rtd) return 0; }
+static void bdw_rt5677_exit(struct snd_soc_pcm_runtime *rtd) +{ + struct bdw_rt5677_priv *bdw_rt5677 = + snd_soc_card_get_drvdata(rtd->card); + + if (!IS_ERR(bdw_rt5677->gpio_hp_en)) + gpiod_put(bdw_rt5677->gpio_hp_en); +} + /* broadwell digital audio interface glue - connects codec <--> CPU */ SND_SOC_DAILINK_DEF(dummy, DAILINK_COMP_ARRAY(COMP_DUMMY())); @@ -350,6 +359,7 @@ static struct snd_soc_dai_link bdw_rt5677_dais[] = { .dpcm_playback = 1, .dpcm_capture = 1, .init = bdw_rt5677_init, + .exit = bdw_rt5677_exit, SND_SOC_DAILINK_REG(ssp0_port, be, platform), }, };
On Thu, Mar 05, 2020 at 07:06:15AM -0600, Pierre-Louis Bossart wrote:
The use of devm_gpiod_get() in a dailink .init() callback generates issues when unloading modules. The dependencies between modules are not well handled and the snd_soc_rt5677 module cannot be removed:
rmmod: ERROR: Module snd_soc_rt5677 is in use
Removing the use of devm_ and manually releasing the gpio descriptor
gpio -> GPIO
in the dailink .exit() callback solves the issue.
Tested on SAMUS Chromebook with SOF driver.
Suggested-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
sound/soc/intel/boards/bdw-rt5677.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/sound/soc/intel/boards/bdw-rt5677.c b/sound/soc/intel/boards/bdw-rt5677.c index a94f498388e1..641491cb449b 100644 --- a/sound/soc/intel/boards/bdw-rt5677.c +++ b/sound/soc/intel/boards/bdw-rt5677.c @@ -247,8 +247,8 @@ static int bdw_rt5677_init(struct snd_soc_pcm_runtime *rtd) RT5677_CLK_SEL_SYS2);
/* Request rt5677 GPIO for headphone amp control */
- bdw_rt5677->gpio_hp_en = devm_gpiod_get(component->dev, "headphone-enable",
GPIOD_OUT_LOW);
- bdw_rt5677->gpio_hp_en = gpiod_get(component->dev, "headphone-enable",
if (IS_ERR(bdw_rt5677->gpio_hp_en)) { dev_err(component->dev, "Can't find HP_AMP_SHDN_L gpio\n"); return PTR_ERR(bdw_rt5677->gpio_hp_en);GPIOD_OUT_LOW);
@@ -282,6 +282,15 @@ static int bdw_rt5677_init(struct snd_soc_pcm_runtime *rtd) return 0; }
+static void bdw_rt5677_exit(struct snd_soc_pcm_runtime *rtd) +{
- struct bdw_rt5677_priv *bdw_rt5677 =
snd_soc_card_get_drvdata(rtd->card);
- if (!IS_ERR(bdw_rt5677->gpio_hp_en))
I'm wondering if you need this check at all? In the above (I left for context) the GPIO is considered mandatory, does the core handles errors from ->init() correctly?
gpiod_put(bdw_rt5677->gpio_hp_en);
+}
/* broadwell digital audio interface glue - connects codec <--> CPU */ SND_SOC_DAILINK_DEF(dummy, DAILINK_COMP_ARRAY(COMP_DUMMY())); @@ -350,6 +359,7 @@ static struct snd_soc_dai_link bdw_rt5677_dais[] = { .dpcm_playback = 1, .dpcm_capture = 1, .init = bdw_rt5677_init,
SND_SOC_DAILINK_REG(ssp0_port, be, platform), },.exit = bdw_rt5677_exit,
};
2.20.1
On 3/5/20 7:25 AM, Andy Shevchenko wrote:
On Thu, Mar 05, 2020 at 07:06:15AM -0600, Pierre-Louis Bossart wrote:
The use of devm_gpiod_get() in a dailink .init() callback generates issues when unloading modules. The dependencies between modules are not well handled and the snd_soc_rt5677 module cannot be removed:
rmmod: ERROR: Module snd_soc_rt5677 is in use
Removing the use of devm_ and manually releasing the gpio descriptor
gpio -> GPIO
yep
+static void bdw_rt5677_exit(struct snd_soc_pcm_runtime *rtd) +{
- struct bdw_rt5677_priv *bdw_rt5677 =
snd_soc_card_get_drvdata(rtd->card);
- if (!IS_ERR(bdw_rt5677->gpio_hp_en))
I'm wondering if you need this check at all? In the above (I left for context) the GPIO is considered mandatory, does the core handles errors from ->init() correctly?
I just rechecked, the error flow is
dailink.init() soc_init_pcm_runtime snd_soc_bind_card probe_end: soc_cleanup_card_resources(card, card_probed); snd_soc_remove_pcm_runtime(card, rtd); dai_link->exit(rtd);
so we do need to recheck if the resources allocated in init() are valid.
I also think the IS_ERR() is correct by looking at the code in gpiod_get_index() but the comments are rather confusing to me:
* Return a valid GPIO descriptor, -ENOENT if no GPIO has been assigned to the * requested function and/or index, or another IS_ERR() code if an error * occurred while trying to acquire the GPIO.
gpiod_put(bdw_rt5677->gpio_hp_en);
+}
On Thu, Mar 05, 2020 at 07:06:15AM -0600, Pierre-Louis Bossart wrote:
The use of devm_gpiod_get() in a dailink .init() callback generates issues when unloading modules. The dependencies between modules are not well handled and the snd_soc_rt5677 module cannot be removed:
In what way are the dependencies not well managed and why aren't we requesting the GPIO on device model probe anyway?
On 3/5/20 7:36 AM, Mark Brown wrote:
On Thu, Mar 05, 2020 at 07:06:15AM -0600, Pierre-Louis Bossart wrote:
The use of devm_gpiod_get() in a dailink .init() callback generates issues when unloading modules. The dependencies between modules are not well handled and the snd_soc_rt5677 module cannot be removed:
In what way are the dependencies not well managed and why aren't we requesting the GPIO on device model probe anyway?
there are a couple of machine drivers where the gpios are requested from the machine driver. I have no idea what it is done this way, maybe the codec exposes a gpiochip that's used later? I was hoping that Andy can help, I don't fully get the acpi mapping and all.
see the code here for reference:
https://elixir.bootlin.com/linux/v5.6-rc4/source/sound/soc/intel/boards/bdw-...
The issue happens when running our test scripts, which do a set a rmmod in a specific order: rmmod of the machine driver, then doing an rmmod of the codec driver. Somehow the second fails with the 'module in use error'.
It's probably because the devm_ release does not happen when the card is unregistered and the machine driver resources released since we use the component device. There might very well be a bug somewhere in the devm_ handling, I just don't have a clue how to debug this - and the .exit() makes sense regardless in other cases unrelated to GPIOs.
On Thu, Mar 05, 2020 at 07:47:47AM -0600, Pierre-Louis Bossart wrote:
On 3/5/20 7:36 AM, Mark Brown wrote:
In what way are the dependencies not well managed and why aren't we requesting the GPIO on device model probe anyway?
there are a couple of machine drivers where the gpios are requested from the machine driver. I have no idea what it is done this way, maybe the codec exposes a gpiochip that's used later? I was hoping that Andy can help, I don't fully get the acpi mapping and all.
This doesn't answer the question: why is the machine driver not requesting the GPIO on device model probe?
The issue happens when running our test scripts, which do a set a rmmod in a specific order: rmmod of the machine driver, then doing an rmmod of the codec driver. Somehow the second fails with the 'module in use error'.
It's probably because the devm_ release does not happen when the card is unregistered and the machine driver resources released since we use the component device. There might very well be a bug somewhere in the devm_ handling, I just don't have a clue how to debug this - and the .exit() makes sense regardless in other cases unrelated to GPIOs.
So you've removed the driver which will have unbound the device but devm actions don't seem to have fired? That seems worrying...
This doesn't answer the question: why is the machine driver not requesting the GPIO on device model probe?
I *think* it's due to the need to use the codec component->dev, which is only available with the dailink callbacks - not on platform device probe which ends with the card registration.
The issue happens when running our test scripts, which do a set a rmmod in a specific order: rmmod of the machine driver, then doing an rmmod of the codec driver. Somehow the second fails with the 'module in use error'.
It's probably because the devm_ release does not happen when the card is unregistered and the machine driver resources released since we use the component device. There might very well be a bug somewhere in the devm_ handling, I just don't have a clue how to debug this - and the .exit() makes sense regardless in other cases unrelated to GPIOs.
So you've removed the driver which will have unbound the device but devm actions don't seem to have fired? That seems worrying...
Well, the devm uses the component device, not the card device, so when removing the machine driver nothing should happen. The problem seems to be in the removal of the codec and component drivers.
We tried to use the card device instead but then the gpiod_get fails.
On Thu, Mar 05, 2020 at 08:51:03AM -0600, Pierre-Louis Bossart wrote:
This doesn't answer the question: why is the machine driver not requesting the GPIO on device model probe?
I *think* it's due to the need to use the codec component->dev, which is only available with the dailink callbacks - not on platform device probe which ends with the card registration.
Why do you have this need? This is sounding a lot like the CODEC ought to be requesting it...
So you've removed the driver which will have unbound the device but devm actions don't seem to have fired? That seems worrying...
Well, the devm uses the component device, not the card device, so when removing the machine driver nothing should happen. The problem seems to be in the removal of the codec and component drivers.
Right, it's always a bad idea to do allocations with devm_ on a device other than the one that you're currently working with - that clearly leads to lifetime issues.
We tried to use the card device instead but then the gpiod_get fails.
I think you need to take a step back and work out what you're actually doing here. It doesn't sound like the problem has been fully understood so there's no clear articulation of what you're trying to do.
On 3/5/20 11:43 AM, Mark Brown wrote:
On Thu, Mar 05, 2020 at 08:51:03AM -0600, Pierre-Louis Bossart wrote:
This doesn't answer the question: why is the machine driver not requesting the GPIO on device model probe?
I *think* it's due to the need to use the codec component->dev, which is only available with the dailink callbacks - not on platform device probe which ends with the card registration.
Why do you have this need? This is sounding a lot like the CODEC ought to be requesting it...
it's been that way since 2016 and the initial contribution. The Chrome folks might know more, I don't think anyone at Intel has worked on this code.
So you've removed the driver which will have unbound the device but devm actions don't seem to have fired? That seems worrying...
Well, the devm uses the component device, not the card device, so when removing the machine driver nothing should happen. The problem seems to be in the removal of the codec and component drivers.
Right, it's always a bad idea to do allocations with devm_ on a device other than the one that you're currently working with - that clearly leads to lifetime issues.
that's precisely what I tried to correct.
We tried to use the card device instead but then the gpiod_get fails.
I think you need to take a step back and work out what you're actually doing here. It doesn't sound like the problem has been fully understood so there's no clear articulation of what you're trying to do.
Can we split this RFC in two: a) do you have any objections to adding an .exit() callback? That's what the main goal was
b) do you have any objections if we remove this devm_ use without trying to dig further into the gpio management. This is a 2015 product that we use to verify the SOF driver on Broadwell, not an Intel-owned device.
On Thu, Mar 05, 2020 at 12:08:57PM -0600, Pierre-Louis Bossart wrote:
On 3/5/20 11:43 AM, Mark Brown wrote:
On Thu, Mar 05, 2020 at 08:51:03AM -0600, Pierre-Louis Bossart wrote:
I *think* it's due to the need to use the codec component->dev, which is only available with the dailink callbacks - not on platform device probe which ends with the card registration.
Why do you have this need? This is sounding a lot like the CODEC ought to be requesting it...
it's been that way since 2016 and the initial contribution. The Chrome folks might know more, I don't think anyone at Intel has worked on this code.
I'd have thought someone would've reviewed it on the way in?
Well, the devm uses the component device, not the card device, so when removing the machine driver nothing should happen. The problem seems to be in the removal of the codec and component drivers.
Right, it's always a bad idea to do allocations with devm_ on a device other than the one that you're currently working with - that clearly leads to lifetime issues.
that's precisely what I tried to correct.
In general the best (clearest, most robust) way to correct something like this would be to continue to use devm_ but clean up the allocation so that it's done by the device that is used.
b) do you have any objections if we remove this devm_ use without trying to dig further into the gpio management. This is a 2015 product that we use to verify the SOF driver on Broadwell, not an Intel-owned device.
The main thing I'm missing with this is a coherent explanation of the problem and how the changes proposed fix it.
On Thu, Mar 05, 2020 at 06:33:35PM +0000, Mark Brown wrote:
On Thu, Mar 05, 2020 at 12:08:57PM -0600, Pierre-Louis Bossart wrote:
b) do you have any objections if we remove this devm_ use without trying to dig further into the gpio management. This is a 2015 product that we use to verify the SOF driver on Broadwell, not an Intel-owned device.
The main thing I'm missing with this is a coherent explanation of the problem and how the changes proposed fix it.
Just to emphasize: the main concern here is that the issue is understood and that it's not just going to pop up again as soon as something changes.
+Ben Zhang benzh@google.com who wrote the original driver for our 3.14 tree.
On Thu, Mar 5, 2020 at 11:12 AM Mark Brown broonie@kernel.org wrote:
On Thu, Mar 05, 2020 at 06:33:35PM +0000, Mark Brown wrote:
On Thu, Mar 05, 2020 at 12:08:57PM -0600, Pierre-Louis Bossart wrote:
b) do you have any objections if we remove this devm_ use without
trying to
dig further into the gpio management. This is a 2015 product that we
use to
verify the SOF driver on Broadwell, not an Intel-owned device.
The main thing I'm missing with this is a coherent explanation of the problem and how the changes proposed fix it.
Just to emphasize: the main concern here is that the issue is understood and that it's not just going to pop up again as soon as something changes.
The main thing I'm missing with this is a coherent explanation of the problem and how the changes proposed fix it.
Just to emphasize: the main concern here is that the issue is understood and that it's not just going to pop up again as soon as something changes.
Here are more details Mark.
<1. finish machine driver probe > [ 115.253970] bdw-rt5677 bdw-rt5677: rt5677-dspbuffer <-> spi-RT5677AA:00 mapping ok
<2. execute BE dailink .init() and request a gpiod from the codec component device>
[ 115.254387] rt5677 i2c-RT5677CE:00: plb devm_gpiod_get [ 115.254390] rt5677 i2c-RT5677CE:00: GPIO lookup for consumer headphone-enable [ 115.254391] rt5677 i2c-RT5677CE:00: using ACPI for GPIO lookup [ 115.254395] acpi RT5677CE:00: GPIO: looking up headphone-enable-gpios [ 115.254399] acpi RT5677CE:00: GPIO: _DSD returned RT5677CE:00 4 0 0 [ 115.254724] rt5677 i2c-RT5677CE:00: GPIO lookup for consumer plug-det [ 115.254725] rt5677 i2c-RT5677CE:00: using ACPI for GPIO lookup [ 115.254727] acpi RT5677CE:00: GPIO: looking up plug-det-gpios [ 115.254729] acpi RT5677CE:00: GPIO: _DSD returned RT5677CE:00 0 0 0 [ 115.255451] rt5677 i2c-RT5677CE:00: GPIO lookup for consumer mic-present [ 115.255453] rt5677 i2c-RT5677CE:00: using ACPI for GPIO lookup [ 115.255455] acpi RT5677CE:00: GPIO: looking up mic-present-gpios [ 115.255458] acpi RT5677CE:00: GPIO: _DSD returned RT5677CE:00 1 0 0
<3. gpiod handling complete>
[ 115.256293] bdw-rt5677 bdw-rt5677: rt5677-aif1 <-> ssp0-port mapping ok
<4. jack handling complete> [ 115.262040] input: sof-bdw-rt5677 Headphone Jack as /devices/pci0000:00/INT3438:00/bdw-rt5677/sound/card1/input11 [ 115.262240] input: sof-bdw-rt5677 Mic Jack as /devices/pci0000:00/INT3438:00/bdw-rt5677/sound/card1/input12
<5. card fully functional>
<6. rmmod snd_sof-acpi>
<7. rmmod machine driver>
<8. rmmod codec driver> rmmod: ERROR: Module snd_soc_rt5677 is in use
<9. rmmod -f codec driver> [ 194.118221] gpio gpiochip0: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED [ 194.118440] rt5677 i2c-RT5677CE:00: plb devm_gpiod_release
So this is a self-inflicted deadlock - broken by design.
When the machine driver is removed, the gpiod is not freed. Only removing the codec driver can free the gpiod, but the gpio/module refcount prevents the codec driver from being removed.
I don't know how to move all the gpio handling in the codec driver, since there are platform-dependent ACPI mappings.
The only proposal I can make is to avoid using devm_ but we need a hook to call gpiod_put().
using the add_dai_link()/remove_dai_link as I suggested earlier is not possible since the runtime is created after this callback is involved.
The proposal suggested by Andy was to have a dual callback to the init(), or as in my initial version to call gpiod_put() in the machine driver .remove() function, which isn't very elegant but does work.
I also tested a different solution (attached) based on your input where the gpiod handing is performed in the machine driver probe, after the card registration, and the gpiod_put() called from remove. This is simple enough but there might be some issues left with the jack/input handling - not sure why the logs for jacks are missing.
Does this clarify the issue and options?
Thanks -Pierre
On Thu, Mar 05, 2020 at 03:48:42PM -0600, Pierre-Louis Bossart wrote:
I don't know how to move all the gpio handling in the codec driver, since there are platform-dependent ACPI mappings.
The idiomatic thing for ACPI is to have a DMI table in the driver that selects the behaviour needed on a given system.
I also tested a different solution (attached) based on your input where the gpiod handing is performed in the machine driver probe, after the card registration, and the gpiod_put() called from remove. This is simple enough but there might be some issues left with the jack/input handling - not sure why the logs for jacks are missing.
Does this clarify the issue and options?
I think I preferred the original version - this does mechanically move things to the device model probe but not really in an idiomatic fashion (we're still requesting a GPIO for the CODEC from the machine driver) so I'm not sure it really helps. The changelog is definitely a lot better though.
On 3/5/2020 2:47 PM, Pierre-Louis Bossart wrote:
On 3/5/20 7:36 AM, Mark Brown wrote:
On Thu, Mar 05, 2020 at 07:06:15AM -0600, Pierre-Louis Bossart wrote:
The use of devm_gpiod_get() in a dailink .init() callback generates issues when unloading modules. The dependencies between modules are not well handled and the snd_soc_rt5677 module cannot be removed:
In what way are the dependencies not well managed and why aren't we requesting the GPIO on device model probe anyway?
there are a couple of machine drivers where the gpios are requested from the machine driver. I have no idea what it is done this way, maybe the codec exposes a gpiochip that's used later? I was hoping that Andy can help, I don't fully get the acpi mapping and all.
see the code here for reference:
https://elixir.bootlin.com/linux/v5.6-rc4/source/sound/soc/intel/boards/bdw-...
The issue happens when running our test scripts, which do a set a rmmod in a specific order: rmmod of the machine driver, then doing an rmmod of the codec driver. Somehow the second fails with the 'module in use error'.
It's probably because the devm_ release does not happen when the card is unregistered and the machine driver resources released since we use the component device. There might very well be a bug somewhere in the devm_ handling, I just don't have a clue how to debug this - and the .exit() makes sense regardless in other cases unrelated to GPIOs.
This sounds related to issue I've seen related to fact that there is devm_snd_soc_register_component and devm_snd_soc_register_card and when cleanup happens, one of them seems to release memory before other one runs it cleanup functions. And then cleanup functions try to operate on already released memory.
I haven't debugged this in depth, and just made simple patch to replace problematic devm_kzalloc with kzalloc and kfree, but there seems something weird happening related to how dynamic memory management works in ASOC.
On Thu, Mar 05, 2020 at 03:06:17PM +0100, Amadeusz Sławiński wrote:
On 3/5/2020 2:47 PM, Pierre-Louis Bossart wrote:
It's probably because the devm_ release does not happen when the card is unregistered and the machine driver resources released since we use the component device. There might very well be a bug somewhere in the devm_ handling, I just don't have a clue how to debug this - and the .exit() makes sense regardless in other cases unrelated to GPIOs.
This sounds related to issue I've seen related to fact that there is devm_snd_soc_register_component and devm_snd_soc_register_card and when cleanup happens, one of them seems to release memory before other one runs it cleanup functions. And then cleanup functions try to operate on already released memory.
I haven't debugged this in depth, and just made simple patch to replace problematic devm_kzalloc with kzalloc and kfree, but there seems something weird happening related to how dynamic memory management works in ASOC.
There's definitely an issue if you mix devm and non-devm stuff (see also the frequent issues with devm_request_irq()). The devm stuff only gets unwound after the remove callback has executed so if you free stuff in the remove callback that you will rely on in the devm cleanup operations then there's issues. The devm stuff will also always get unwound in the opposite order to that which it was allocated which usually isn't a problem but can be worth paying attention to.
On Thu, Mar 05, 2020 at 07:47:47AM -0600, Pierre-Louis Bossart wrote:
On 3/5/20 7:36 AM, Mark Brown wrote:
On Thu, Mar 05, 2020 at 07:06:15AM -0600, Pierre-Louis Bossart wrote:
The use of devm_gpiod_get() in a dailink .init() callback generates issues when unloading modules. The dependencies between modules are not well handled and the snd_soc_rt5677 module cannot be removed:
In what way are the dependencies not well managed and why aren't we requesting the GPIO on device model probe anyway?
there are a couple of machine drivers where the gpios are requested from the machine driver. I have no idea what it is done this way, maybe the codec exposes a gpiochip that's used later? I was hoping that Andy can help,
I don't know the codebase, so, I was suggested this solution based on my experience. I can't answer right now why that had been done that way (especially that it had been done not by me or any my involvement at the time).
I don't fully get the acpi mapping and all.
This one is easy to explain. ACPI lacks of the proper labeling / mapping GPIO resources. _DSD() method helps there, but there are no Wintel firmware that supports it (Google basically is the first who utilizes it).
That's why the board code has mapping table to allow request GPIOs by label (connection ID in terms of GPIO suybsystem).
see the code here for reference:
https://elixir.bootlin.com/linux/v5.6-rc4/source/sound/soc/intel/boards/bdw-...
The issue happens when running our test scripts, which do a set a rmmod in a specific order: rmmod of the machine driver, then doing an rmmod of the codec driver. Somehow the second fails with the 'module in use error'.
It's probably because the devm_ release does not happen when the card is unregistered and the machine driver resources released since we use the component device. There might very well be a bug somewhere in the devm_ handling, I just don't have a clue how to debug this - and the .exit() makes sense regardless in other cases unrelated to GPIOs.
Yes.
On Thu, Mar 05, 2020 at 04:27:23PM +0200, Andy Shevchenko wrote:
On Thu, Mar 05, 2020 at 07:47:47AM -0600, Pierre-Louis Bossart wrote:
I don't fully get the acpi mapping and all.
This one is easy to explain. ACPI lacks of the proper labeling / mapping GPIO resources. _DSD() method helps there, but there are no Wintel firmware that supports it (Google basically is the first who utilizes it).
That's not entirely true - the _DSD stuff was also actively being used by the embedded x86 people since they needed firmware bindings for things and wanted to import all the work that's been done for DT, or as much as possible anyway given that there's bits of ACPI that actively conflict with DT. They were driving this much more actively and doing much more extensive work than the ChromeOS people. That all seems to have been abandoned though.
The gpiod handling is inspired from the bdw-rt5677 code. Apply same fix to avoid reference count issue while removing modules for consistency.
The SOF driver does not yet support this machine driver, and module load/unload with the SKL driver isn't well supported, so this was not tested on a device.
Suggested-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- sound/soc/intel/boards/kbl_rt5660.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/sound/soc/intel/boards/kbl_rt5660.c b/sound/soc/intel/boards/kbl_rt5660.c index e23dea9ab79a..3ff3afd36536 100644 --- a/sound/soc/intel/boards/kbl_rt5660.c +++ b/sound/soc/intel/boards/kbl_rt5660.c @@ -165,8 +165,8 @@ static int kabylake_rt5660_codec_init(struct snd_soc_pcm_runtime *rtd) dev_warn(component->dev, "Failed to add driver gpios\n");
/* Request rt5660 GPIO for lineout mute control, return if fails */ - ctx->gpio_lo_mute = devm_gpiod_get(component->dev, "lineout-mute", - GPIOD_OUT_HIGH); + ctx->gpio_lo_mute = gpiod_get(component->dev, "lineout-mute", + GPIOD_OUT_HIGH); if (IS_ERR(ctx->gpio_lo_mute)) { dev_err(component->dev, "Can't find GPIO_MUTE# gpio\n"); return PTR_ERR(ctx->gpio_lo_mute); @@ -207,6 +207,14 @@ static int kabylake_rt5660_codec_init(struct snd_soc_pcm_runtime *rtd) return 0; }
+static void kabylake_rt5660_codec_exit(struct snd_soc_pcm_runtime *rtd) +{ + struct kbl_codec_private *ctx = snd_soc_card_get_drvdata(rtd->card); + + if (!IS_ERR(ctx->gpio_lo_mute)) + gpiod_put(ctx->gpio_lo_mute)); +} + static int kabylake_hdmi_init(struct snd_soc_pcm_runtime *rtd, int device) { struct kbl_codec_private *ctx = snd_soc_card_get_drvdata(rtd->card); @@ -421,6 +429,7 @@ static struct snd_soc_dai_link kabylake_rt5660_dais[] = { .id = 0, .no_pcm = 1, .init = kabylake_rt5660_codec_init, + .exit = kabylake_rt5660_codec_exit, .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF | SND_SOC_DAIFMT_CBS_CFS,
On Thu, Mar 05, 2020 at 07:06:16AM -0600, Pierre-Louis Bossart wrote:
The gpiod handling is inspired from the bdw-rt5677 code. Apply same
gpiod -> GPIO descriptor
fix to avoid reference count issue while removing modules for consistency.
The SOF driver does not yet support this machine driver, and module load/unload with the SKL driver isn't well supported, so this was not tested on a device.
Suggested-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
sound/soc/intel/boards/kbl_rt5660.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/sound/soc/intel/boards/kbl_rt5660.c b/sound/soc/intel/boards/kbl_rt5660.c index e23dea9ab79a..3ff3afd36536 100644 --- a/sound/soc/intel/boards/kbl_rt5660.c +++ b/sound/soc/intel/boards/kbl_rt5660.c @@ -165,8 +165,8 @@ static int kabylake_rt5660_codec_init(struct snd_soc_pcm_runtime *rtd) dev_warn(component->dev, "Failed to add driver gpios\n");
/* Request rt5660 GPIO for lineout mute control, return if fails */
- ctx->gpio_lo_mute = devm_gpiod_get(component->dev, "lineout-mute",
GPIOD_OUT_HIGH);
- ctx->gpio_lo_mute = gpiod_get(component->dev, "lineout-mute",
if (IS_ERR(ctx->gpio_lo_mute)) { dev_err(component->dev, "Can't find GPIO_MUTE# gpio\n"); return PTR_ERR(ctx->gpio_lo_mute);GPIOD_OUT_HIGH);
@@ -207,6 +207,14 @@ static int kabylake_rt5660_codec_init(struct snd_soc_pcm_runtime *rtd) return 0; }
+static void kabylake_rt5660_codec_exit(struct snd_soc_pcm_runtime *rtd) +{
- struct kbl_codec_private *ctx = snd_soc_card_get_drvdata(rtd->card);
- if (!IS_ERR(ctx->gpio_lo_mute))
Same comment as per previous patch.
gpiod_put(ctx->gpio_lo_mute));
+}
static int kabylake_hdmi_init(struct snd_soc_pcm_runtime *rtd, int device) { struct kbl_codec_private *ctx = snd_soc_card_get_drvdata(rtd->card); @@ -421,6 +429,7 @@ static struct snd_soc_dai_link kabylake_rt5660_dais[] = { .id = 0, .no_pcm = 1, .init = kabylake_rt5660_codec_init,
.dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF | SND_SOC_DAIFMT_CBS_CFS,.exit = kabylake_rt5660_codec_exit,
-- 2.20.1
participants (5)
-
Amadeusz Sławiński
-
Andy Shevchenko
-
Curtis Malainey
-
Mark Brown
-
Pierre-Louis Bossart