[PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
From: xiao jin jin.xiao@intel.com
After start of fe and be, fe might go to close without triggering STOP, and substream->runtime is freed. However, be is still at START state and its substream->runtime still points to the freed runtime.
Later on, FE is opened/started again, and triggers STOP. snd_pcm_do_stop => dpcm_fe_dai_trigger => dpcm_fe_dai_do_trigger => dpcm_be_dai_trigger => dpcm_do_trigger => soc_pcm_trigger => skl_platform_pcm_trigger skl_platform_pcm_trigger accesses the freed old runtime data and kernel panic.
The patch fixes it by assigning be_substream->runtime in dpcm_be_dai_startup when be's state is START.
Signed-off-by: xiao jin jin.xiao@intel.com Signed-off-by: Zhang Yanmin yanmin.zhang@intel.com Signed-off-by: Eugeniu Rosca erosca@de.adit-jv.com --- sound/soc/soc-pcm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 4f60c0a83311..6ca1d02065ce 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -1608,6 +1608,8 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream) if (be->dpcm[stream].users++ != 0) continue;
+ be_substream->runtime = be->dpcm[stream].runtime; + if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_NEW) && (be->dpcm[stream].state != SND_SOC_DPCM_STATE_CLOSE)) continue; @@ -1615,7 +1617,6 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream) dev_dbg(be->dev, "ASoC: open %s BE %s\n", stream ? "capture" : "playback", be->dai_link->name);
- be_substream->runtime = be->dpcm[stream].runtime; err = __soc_pcm_open(be, be_substream); if (err < 0) { be->dpcm[stream].users--;
On 9/26/2022 6:35 PM, Eugeniu Rosca wrote:
From: xiao jin jin.xiao@intel.com
After start of fe and be, fe might go to close without triggering STOP, and substream->runtime is freed. However, be is still at START state and its substream->runtime still points to the freed runtime.
Well if it is being freed, maybe pointer should be set to NULL instead in place that happens?
Later on, FE is opened/started again, and triggers STOP. snd_pcm_do_stop => dpcm_fe_dai_trigger => dpcm_fe_dai_do_trigger => dpcm_be_dai_trigger => dpcm_do_trigger => soc_pcm_trigger => skl_platform_pcm_trigger skl_platform_pcm_trigger accesses the freed old runtime data and kernel panic.
The patch fixes it by assigning be_substream->runtime in dpcm_be_dai_startup when be's state is START.
Signed-off-by: xiao jin jin.xiao@intel.com Signed-off-by: Zhang Yanmin yanmin.zhang@intel.com Signed-off-by: Eugeniu Rosca erosca@de.adit-jv.com
sound/soc/soc-pcm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 4f60c0a83311..6ca1d02065ce 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -1608,6 +1608,8 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream) if (be->dpcm[stream].users++ != 0) continue;
be_substream->runtime = be->dpcm[stream].runtime;
- if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_NEW) && (be->dpcm[stream].state != SND_SOC_DPCM_STATE_CLOSE)) continue;
@@ -1615,7 +1617,6 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream) dev_dbg(be->dev, "ASoC: open %s BE %s\n", stream ? "capture" : "playback", be->dai_link->name);
err = __soc_pcm_open(be, be_substream); if (err < 0) { be->dpcm[stream].users--;be_substream->runtime = be->dpcm[stream].runtime;
Hello Amadeusz,
On Di, Sep 27, 2022 at 09:34:49 +0200, Amadeusz Sławiński wrote:
On 9/26/2022 6:35 PM, Eugeniu Rosca wrote:
From: xiao jin jin.xiao@intel.com
After start of fe and be, fe might go to close without triggering STOP, and substream->runtime is freed. However, be is still at START state and its substream->runtime still points to the freed runtime.
Well if it is being freed, maybe pointer should be set to NULL instead in place that happens?
As soon as the agreement is reached that the patch is still relevant/useful on the latest vanilla, I think this review finding will receive proper attention.
Thanks, Eugeniu
On 2022-09-26 6:35 PM, Eugeniu Rosca wrote:
From: xiao jin jin.xiao@intel.com
After start of fe and be, fe might go to close without triggering STOP, and substream->runtime is freed. However, be is still at START state and its substream->runtime still points to the freed runtime.
Later on, FE is opened/started again, and triggers STOP. snd_pcm_do_stop => dpcm_fe_dai_trigger => dpcm_fe_dai_do_trigger => dpcm_be_dai_trigger => dpcm_do_trigger => soc_pcm_trigger => skl_platform_pcm_trigger skl_platform_pcm_trigger accesses the freed old runtime data and kernel panic.
The patch fixes it by assigning be_substream->runtime in dpcm_be_dai_startup when be's state is START.
Signed-off-by: xiao jin jin.xiao@intel.com Signed-off-by: Zhang Yanmin yanmin.zhang@intel.com Signed-off-by: Eugeniu Rosca erosca@de.adit-jv.com
Hello,
The change seems to be driven by the skylake-driver problem. With all due respect, why not ping owners of the driver first? There are some crucial CCs missing.
I'd like to know more about the scenario you guys reproduced the problem in. Configuration details and kernel base would be good to know too. Since our CI did not detect problem of such sort, if the problem actually exists, we would like to append a test or two to cover it later on.
Regards, Czarek
Hello Czarek,
On Di, Sep 27, 2022 at 09:50:05 +0200, Cezary Rojewski wrote:
On 2022-09-26 6:35 PM, Eugeniu Rosca wrote:
From: xiao jin jin.xiao@intel.com
After start of fe and be, fe might go to close without triggering STOP, and substream->runtime is freed. However, be is still at START state and its substream->runtime still points to the freed runtime.
Later on, FE is opened/started again, and triggers STOP. snd_pcm_do_stop => dpcm_fe_dai_trigger => dpcm_fe_dai_do_trigger => dpcm_be_dai_trigger => dpcm_do_trigger => soc_pcm_trigger => skl_platform_pcm_trigger skl_platform_pcm_trigger accesses the freed old runtime data and kernel panic.
The patch fixes it by assigning be_substream->runtime in dpcm_be_dai_startup when be's state is START.
Signed-off-by: xiao jin jin.xiao@intel.com Signed-off-by: Zhang Yanmin yanmin.zhang@intel.com Signed-off-by: Eugeniu Rosca erosca@de.adit-jv.com
Hello,
The change seems to be driven by the skylake-driver problem.
Agreed, based on the author/co-signer's e-mail and the call stack.
With all due respect, why not ping owners of the driver first? There are some crucial CCs missing.
Some feedback already provided by Pierre-Louis Bossart (many thanks). Cc-ing more Intel contributors in the sound subsystem.
I'd like to know more about the scenario you guys reproduced the problem in.
This patch was originally identified in the Intel Apollo Lake v4.1 KNLs. Given that the change itself is in the core sound subsystem, our internal assessment was that the patch might potentially be relevant/helpful on other HW platforms.
Our intention is to confirm or invalidate this assumption with the original developers of the patch, as well as with the audio maintainers and the members of the alsa-devel ML.
Configuration details and kernel base would be good to know too. Since our CI did not detect problem of such sort, if the problem actually exists, we would like to append a test or two to cover it later on.
If there is no evidence that the patch is fixing a real-life issue occurring in the latest vanilla, I agree to drop the patch.
So far, I do not possess this evidence myself.
Regards, Czarek
Best regards, Eugeniu
On 2022-09-27 1:00 PM, Eugeniu Rosca wrote:
Hello Czarek,
...
I'd like to know more about the scenario you guys reproduced the problem in.
This patch was originally identified in the Intel Apollo Lake v4.1 KNLs. Given that the change itself is in the core sound subsystem, our internal assessment was that the patch might potentially be relevant/helpful on other HW platforms.
Our intention is to confirm or invalidate this assumption with the original developers of the patch, as well as with the audio maintainers and the members of the alsa-devel ML.
Configuration details and kernel base would be good to know too. Since our CI did not detect problem of such sort, if the problem actually exists, we would like to append a test or two to cover it later on.
If there is no evidence that the patch is fixing a real-life issue occurring in the latest vanilla, I agree to drop the patch.
So far, I do not possess this evidence myself.
I've spent some time to locate the change. Found it and it seems obsolete. Some tags are missing in the revision of yours and the original date does not match either - it's Apr 2018 for the original. Won't be mentioning the tags as some engineers no longer bear @intel.com.
soc-pcm and skylake-driver valuable bits from those trees are already part of the upstream. Most of what is left was later proven obsolete or redundant by my or Pierre's engineers. There seems to be no patch missing except for few fixes from the recent SKL/KBL up-revs for our clients. Nothing APL specific.
Following kernels related to APL are maintained by the IPG team from software perspective: 4.1.42, 4.1.49, 4.4, 4.9, 4.14, 4.19
Multiple OSes. And then there are flavors for kernels/OS both. It's quite likely kernel base of yours fits into one of these buckets or at least have had changes ported from one of them.
TLDR: I agree here with my colleagues - if you believe the change is necessary, a proof e.g.: in form of reproduction steps, is needed. Otherwise it's no-go. Happy to hop on a call should you need any additional information.
Regards, Czarek
Hello Czarek,
Thank you for your friendly feedback.
On Mi, Sep 28, 2022 at 04:24:43 +0200, Cezary Rojewski wrote:
On 2022-09-27 1:00 PM, Eugeniu Rosca wrote:
Hello Czarek,
...
I'd like to know more about the scenario you guys reproduced the problem in.
This patch was originally identified in the Intel Apollo Lake v4.1 KNLs. Given that the change itself is in the core sound subsystem, our internal assessment was that the patch might potentially be relevant/helpful on other HW platforms.
Our intention is to confirm or invalidate this assumption with the original developers of the patch, as well as with the audio maintainers and the members of the alsa-devel ML.
Configuration details and kernel base would be good to know too. Since our CI did not detect problem of such sort, if the problem actually exists, we would like to append a test or two to cover it later on.
If there is no evidence that the patch is fixing a real-life issue occurring in the latest vanilla, I agree to drop the patch.
So far, I do not possess this evidence myself.
I've spent some time to locate the change. Found it and it seems obsolete. Some tags are missing in the revision of yours and the original date does not match either - it's Apr 2018 for the original. Won't be mentioning the tags as some engineers no longer bear @intel.com.
soc-pcm and skylake-driver valuable bits from those trees are already part of the upstream. Most of what is left was later proven obsolete or redundant by my or Pierre's engineers. There seems to be no patch missing except for few fixes from the recent SKL/KBL up-revs for our clients. Nothing APL specific.
Thanks for this thorough check. That also gives us enough confidence to drop the patch in some of our downstream kernels.
Following kernels related to APL are maintained by the IPG team from software perspective: 4.1.42, 4.1.49, 4.4, 4.9, 4.14, 4.19
Multiple OSes. And then there are flavors for kernels/OS both. It's quite likely kernel base of yours fits into one of these buckets or at least have had changes ported from one of them.
Good to know and yes, you are right w.r.t. the origin of the patch.
TLDR: I agree here with my colleagues - if you believe the change is necessary, a proof e.g.: in form of reproduction steps, is needed. Otherwise it's no-go. Happy to hop on a call should you need any additional information.
That's a very kind attitude and we will definitely share any empirical evidence if it turns out the patch is really contributing with healing of any future runtime issues.
Regards, Czarek
Best Regards Eugeniu
On 9/26/22 18:35, Eugeniu Rosca wrote:
From: xiao jin jin.xiao@intel.com
After start of fe and be, fe might go to close without triggering STOP, and substream->runtime is freed. However, be is still at START state and its substream->runtime still points to the freed runtime.
Later on, FE is opened/started again, and triggers STOP. snd_pcm_do_stop => dpcm_fe_dai_trigger => dpcm_fe_dai_do_trigger => dpcm_be_dai_trigger => dpcm_do_trigger => soc_pcm_trigger => skl_platform_pcm_trigger skl_platform_pcm_trigger accesses the freed old runtime data and kernel panic.
The patch fixes it by assigning be_substream->runtime in dpcm_be_dai_startup when be's state is START.
Can I ask on which kernel this patch was validated and on what platform?
We've done a lot of work since last year on DPCM states, and I wonder the problem mentioned above actually exists on recent kernels.
Specifically, if the FE is closed, I don't get how the BE is not closed as well. And if this problem is found on a recent kernel, then it should be seen in the AVS driver as well, no?
Signed-off-by: xiao jin jin.xiao@intel.com Signed-off-by: Zhang Yanmin yanmin.zhang@intel.com Signed-off-by: Eugeniu Rosca erosca@de.adit-jv.com
sound/soc/soc-pcm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 4f60c0a83311..6ca1d02065ce 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -1608,6 +1608,8 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream) if (be->dpcm[stream].users++ != 0) continue;
be_substream->runtime = be->dpcm[stream].runtime;
- if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_NEW) && (be->dpcm[stream].state != SND_SOC_DPCM_STATE_CLOSE)) continue;
@@ -1615,7 +1617,6 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream) dev_dbg(be->dev, "ASoC: open %s BE %s\n", stream ? "capture" : "playback", be->dai_link->name);
err = __soc_pcm_open(be, be_substream); if (err < 0) { be->dpcm[stream].users--;be_substream->runtime = be->dpcm[stream].runtime;
Hi Pierre,
On Di, Sep 27, 2022 at 09:51:46 +0200, Pierre-Louis Bossart wrote:
On 9/26/22 18:35, Eugeniu Rosca wrote:
From: xiao jin jin.xiao@intel.com
After start of fe and be, fe might go to close without triggering STOP, and substream->runtime is freed. However, be is still at START state and its substream->runtime still points to the freed runtime.
Later on, FE is opened/started again, and triggers STOP. snd_pcm_do_stop => dpcm_fe_dai_trigger => dpcm_fe_dai_do_trigger => dpcm_be_dai_trigger => dpcm_do_trigger => soc_pcm_trigger => skl_platform_pcm_trigger skl_platform_pcm_trigger accesses the freed old runtime data and kernel panic.
The patch fixes it by assigning be_substream->runtime in dpcm_be_dai_startup when be's state is START.
Can I ask on which kernel this patch was validated and on what platform?
As shared with Czarek in https://lore.kernel.org/alsa-devel/20220927110022.GA3802@lxhi-065/ , this patch was originally extracted from the out-of-tree Intel Apollo Lake v4.1 KNL releases, hence it was validated on Intel ref.boards.
No re-testing/re-validation has been performed on the latest vanilla.
One of the goals behind submitting the patch is getting in touch with the original authors, as well as the members of alsa-devel, to assess if the patch is still relevant.
We've done a lot of work since last year on DPCM states,
Could you please feedback if the work on the DPCM states is pre- or post-v5.10?
and I wonder the problem mentioned above actually exists on recent kernels.
Specifically, if the FE is closed, I don't get how the BE is not closed as well. And if this problem is found on a recent kernel, then it should be seen in the AVS driver as well, no?
It is totally conceivable (if not very likely) that the mainline advancements in the sound subsystem make this patch obsolete.
I would be happy if that's the final outcome of our discussion (since this will allow dropping the patch in our downstream kernel).
Best Regards, Eugeniu
On 9/27/22 14:30, Eugeniu Rosca wrote:
Hi Pierre,
On Di, Sep 27, 2022 at 09:51:46 +0200, Pierre-Louis Bossart wrote:
On 9/26/22 18:35, Eugeniu Rosca wrote:
From: xiao jin jin.xiao@intel.com
After start of fe and be, fe might go to close without triggering STOP, and substream->runtime is freed. However, be is still at START state and its substream->runtime still points to the freed runtime.
Later on, FE is opened/started again, and triggers STOP. snd_pcm_do_stop => dpcm_fe_dai_trigger => dpcm_fe_dai_do_trigger => dpcm_be_dai_trigger => dpcm_do_trigger => soc_pcm_trigger => skl_platform_pcm_trigger skl_platform_pcm_trigger accesses the freed old runtime data and kernel panic.
The patch fixes it by assigning be_substream->runtime in dpcm_be_dai_startup when be's state is START.
Can I ask on which kernel this patch was validated and on what platform?
As shared with Czarek in https://lore.kernel.org/alsa-devel/20220927110022.GA3802@lxhi-065/ , this patch was originally extracted from the out-of-tree Intel Apollo Lake v4.1 KNL releases, hence it was validated on Intel ref.boards.
No re-testing/re-validation has been performed on the latest vanilla.
There's no way to predict how a patch for a kernel 4.1 - released 7 years ago - would behave with a new kernel. If it's not tested it cannot be merged.
One of the goals behind submitting the patch is getting in touch with the original authors, as well as the members of alsa-devel, to assess if the patch is still relevant.
The only thing we could do is have more clarity on the test case and try to reproduce it.
We've done a lot of work since last year on DPCM states,
Could you please feedback if the work on the DPCM states is pre- or post-v5.10?
It doesn't matter for this discussion on the upstream kernel. But yes it's post v5.10.
and I wonder the problem mentioned above actually exists on recent kernels.
Specifically, if the FE is closed, I don't get how the BE is not closed as well. And if this problem is found on a recent kernel, then it should be seen in the AVS driver as well, no?
It is totally conceivable (if not very likely) that the mainline advancements in the sound subsystem make this patch obsolete.
I would be happy if that's the final outcome of our discussion (since this will allow dropping the patch in our downstream kernel).
Best Regards, Eugeniu
Hi Pierre,
On Mi, Sep 28, 2022 at 10:36:32 +0200, Pierre-Louis Bossart wrote:
On 9/27/22 14:30, Eugeniu Rosca wrote:
Hi Pierre,
On Di, Sep 27, 2022 at 09:51:46 +0200, Pierre-Louis Bossart wrote:
On 9/26/22 18:35, Eugeniu Rosca wrote:
From: xiao jin jin.xiao@intel.com
After start of fe and be, fe might go to close without triggering STOP, and substream->runtime is freed. However, be is still at START state and its substream->runtime still points to the freed runtime.
Later on, FE is opened/started again, and triggers STOP. snd_pcm_do_stop => dpcm_fe_dai_trigger => dpcm_fe_dai_do_trigger => dpcm_be_dai_trigger => dpcm_do_trigger => soc_pcm_trigger => skl_platform_pcm_trigger skl_platform_pcm_trigger accesses the freed old runtime data and kernel panic.
The patch fixes it by assigning be_substream->runtime in dpcm_be_dai_startup when be's state is START.
Can I ask on which kernel this patch was validated and on what platform?
As shared with Czarek, this patch was originally extracted from the out-of-tree Intel Apollo Lake v4.1 KNL releases, hence it was validated on Intel ref.boards.
No re-testing/re-validation has been performed on the latest vanilla.
There's no way to predict how a patch for a kernel 4.1 - released 7 years ago - would behave with a new kernel. If it's not tested it cannot be merged.
No disagreement here :)
One of the goals behind submitting the patch is getting in touch with the original authors, as well as the members of alsa-devel, to assess if the patch is still relevant.
The only thing we could do is have more clarity on the test case and try to reproduce it.
Agreed. As soon as a test-case pops up where the patch makes a difference in the runtime behavior, you will hear back from us.
We've done a lot of work since last year on DPCM states,
Could you please feedback if the work on the DPCM states is pre- or post-v5.10?
It doesn't matter for this discussion on the upstream kernel. But yes it's post v5.10.
Thanks. This is helpful in the downstream context.
and I wonder the problem mentioned above actually exists on recent kernels.
Specifically, if the FE is closed, I don't get how the BE is not closed as well. And if this problem is found on a recent kernel, then it should be seen in the AVS driver as well, no?
It is totally conceivable (if not very likely) that the mainline advancements in the sound subsystem make this patch obsolete.
I would be happy if that's the final outcome of our discussion (since this will allow dropping the patch in our downstream kernel).
Best Regards Eugeniu
participants (4)
-
Amadeusz Sławiński
-
Cezary Rojewski
-
Eugeniu Rosca
-
Pierre-Louis Bossart