Does it happen from soc-topology.c :: remove_link ?
it seems to happen after the topology remove link, see the traces below
I can't test, but can this patch solve your issue?
No, the problem remains after applying your suggested fix.
I added a bunch of traces and it seems we have a nasty case of corrupted linked lists:
diff --git a/sound/soc/soc-component.c b/sound/soc/soc-component.c index 98ef0666add2..5b0139ebe8f3 100644 --- a/sound/soc/soc-component.c +++ b/sound/soc/soc-component.c @@ -518,11 +518,39 @@ int snd_soc_pcm_component_new(struct snd_pcm *pcm)
void snd_soc_pcm_component_free(struct snd_pcm *pcm) { - struct snd_soc_pcm_runtime *rtd = pcm->private_data; + struct snd_soc_pcm_runtime *rtd; struct snd_soc_rtdcom_list *rtdcom; struct snd_soc_component *component;
- for_each_rtd_components(rtd, rtdcom, component) - if (component->driver->pcm_destruct) + pr_err("plb: %s start\n", __func__); + + if (!pcm) + pr_err("plb: %s PCM is NULL\n", __func__); + + pr_err("plb: %s accessing private data\n", __func__); + rtd = pcm->private_data; + pr_err("plb: %s accessed private data\n", __func__); + + if (!rtd) + pr_err("plb: %s RTD is NULL\n", __func__); + + pr_err("plb: %s accessing components\n", __func__); + for_each_rtd_components(rtd, rtdcom, component) { + pr_err("plb: %s processing component\n", __func__); + if (!component) + pr_err("plb: %s component is NULL\n", __func__); + + if (!component->driver) + pr_err("plb: %s component driver is NULL\n", __func__); + + pr_err("plb: %s pcm_destruct checks\n", __func__); + if (component->driver->pcm_destruct) { + pr_err("plb: %s pcm_destruct start\n", __func__); component->driver->pcm_destruct(component, pcm); + pr_err("plb: %s pcm_destruct done\n", __func__); + } + pr_err("plb: %s processing component done\n", __func__); + } + + pr_err("plb: %s done\n", __func__); }
And the results show the for_each_rtd_components loop goes in the weeds.
82.069990] sof-audio-pci 0000:00:1f.3: plb: remove_link start [ 82.069993] sof-audio-pci 0000:00:1f.3: plb: remove_link 2 [ 82.069996] sof-audio-pci 0000:00:1f.3: plb: remove_link before snd_soc_remove_dai_link [ 82.069998] plb: snd_soc_remove_dai_link start [ 82.070016] plb: snd_soc_remove_dai_link done [ 82.070020] sof-audio-pci 0000:00:1f.3: plb: remove_link done <removed DSP power down sequence> [ 82.179021] plb: snd_soc_pcm_component_free start [ 82.179023] plb: snd_soc_pcm_component_free accessing private data [ 82.179024] plb: snd_soc_pcm_component_free accessed private data [ 82.179025] plb: snd_soc_pcm_component_free accessing components [ 82.179025] plb: snd_soc_pcm_component_free processing component [ 82.179029] BUG: kernel NULL pointer dereference, address: 0000000000000064 [ 82.179030] #PF: supervisor read access in kernel mode [ 82.179031] #PF: error_code(0x0000) - not-present page [ 82.179032] PGD 0 P4D 0 [ 82.179034] Oops: 0000 [#1] SMP NOPTI [ 82.179036] CPU: 3 PID: 768 Comm: pulseaudio Not tainted 5.4.0-rc5-test+ #31 [ 82.179036] Hardware name: Acer Swift SF314-55/MILLER_WL, BIOS V1.05 10/03/2018 [ 82.179042] RIP: 0010:snd_soc_pcm_component_free+0xc7/0x16a [snd_soc_core] [ 82.179043] Code: 43 08 48 c7 c6 f0 24 6e c0 4c 39 e0 0f 84 a9 00 00 00 48 8b 2b 48 85 ed 0f 84 9d 00 00 00 48 c7 c7 00 51 6e c0 e8 d2 5d 5d f2 <48> 83 7d 60 00 75 13 48 c7 c6 f0 24 6e c0 48 c7 c7 20 51 6e c0 e8 [ 82.179044] RSP: 0018:ffffa70180bf3d78 EFLAGS: 00010246 [ 82.179046] RAX: 0000000000000034 RBX: ffffa00f7aaf3968 RCX: 0000000000000006 [ 82.179047] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffffa00fa5ad63d0 [ 82.179048] RBP: 0000000000000004 R08: ffffa70180bf3c3d R09: 0000000000001518 [ 82.179049] R10: ffffa70180bf3c38 R11: ffffa70180bf3c3d R12: ffffa00fa1be4eb0 [ 82.179050] R13: ffffa00fa27aa000 R14: dead000000000122 R15: dead000000000100 [ 82.179052] FS: 00007f4e7e5ebc80(0000) GS:ffffa00fa5ac0000(0000) knlGS:0000000000000000 [ 82.179054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 82.179055] CR2: 0000000000000064 CR3: 0000000253d68005 CR4: 00000000003606e0 [ 82.179056] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 82.179057] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 82.179058] Call Trace: [ 82.179064] snd_pcm_free+0x1a/0x50 [snd_pcm]
I have absolutely no idea what all these data structures are, just reporting this.
reverting "ASoC: soc-core: add soc_unbind_dai_link()" is the only work-around at this point. i've tested this module load/unload for hours without issues.
It's actually quite interesting since this snd_soc_pcm_component_free() calls a .pcm_destruct() callback that's not used by the SOF driver. It's only used on Intel platforms for the Skylake/SST driver, not sure why and if SOF is missing something.