[alsa-devel] CPU hook snd_hda_intel
Hello,
Please keep me in Cc when you answer as I am not subscribed to the list.
I use the most current kernel 5.4.1 (but I did have that problem since several 5.x versions) on gentoo on a X61s Lenovo laptop. What I see is that from time to time one CPU is constantly working and draining my battery with a process kworker/*-events.
Today I was able to identify the source of the problem. The module snd-hda-intel seems to produce that load. When I unload the module, the CPU hook is gone. Even if I load it again, I have a good change to solve the problem.
It also helps often to put the laptop to sleep and waking it up again.
My relevant dmesg part is: [ 7.186766] snd_hda_intel 0000:00:1b.0: probe_mask set to 0x1 for device 17aa:20ac [ 7.563277] snd_hda_codec_generic hdaudioC0D0: autoconfig for Generic: line_outs=1 (0x12/0x0/0x0/0x0/0x0) type:speaker [ 7.563281] snd_hda_codec_generic hdaudioC0D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) [ 7.563284] snd_hda_codec_generic hdaudioC0D0: hp_outs=1 (0x11/0x0/0x0/0x0/0x0) [ 7.563286] snd_hda_codec_generic hdaudioC0D0: mono: mono_out=0x0 [ 7.563288] snd_hda_codec_generic hdaudioC0D0: inputs: [ 7.563292] snd_hda_codec_generic hdaudioC0D0: Internal Mic=0x15 [ 7.563294] snd_hda_codec_generic hdaudioC0D0: Mic=0x14 [ 7.563297] snd_hda_codec_generic hdaudioC0D0: Dock Mic=0x1c
For reference I attach my kernel config to this mail. I am not sure if it is needed but better give as much as informations that I have.
So please, if you have any idea how to narrow that problem or even solve it, help would be appreciated.
Regards Klaus
On Mon, 02 Dec 2019 19:47:59 +0100, Klaus Ethgen wrote:
Hello,
Please keep me in Cc when you answer as I am not subscribed to the list.
I use the most current kernel 5.4.1 (but I did have that problem since several 5.x versions) on gentoo on a X61s Lenovo laptop. What I see is that from time to time one CPU is constantly working and draining my battery with a process kworker/*-events.
Today I was able to identify the source of the problem. The module snd-hda-intel seems to produce that load. When I unload the module, the CPU hook is gone. Even if I load it again, I have a good change to solve the problem.
It also helps often to put the laptop to sleep and waking it up again.
My relevant dmesg part is: [ 7.186766] snd_hda_intel 0000:00:1b.0: probe_mask set to 0x1 for device 17aa:20ac [ 7.563277] snd_hda_codec_generic hdaudioC0D0: autoconfig for Generic: line_outs=1 (0x12/0x0/0x0/0x0/0x0) type:speaker [ 7.563281] snd_hda_codec_generic hdaudioC0D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) [ 7.563284] snd_hda_codec_generic hdaudioC0D0: hp_outs=1 (0x11/0x0/0x0/0x0/0x0) [ 7.563286] snd_hda_codec_generic hdaudioC0D0: mono: mono_out=0x0 [ 7.563288] snd_hda_codec_generic hdaudioC0D0: inputs: [ 7.563292] snd_hda_codec_generic hdaudioC0D0: Internal Mic=0x15 [ 7.563294] snd_hda_codec_generic hdaudioC0D0: Mic=0x14 [ 7.563297] snd_hda_codec_generic hdaudioC0D0: Dock Mic=0x1c
For reference I attach my kernel config to this mail. I am not sure if it is needed but better give as much as informations that I have.
So please, if you have any idea how to narrow that problem or even solve it, help would be appreciated.
I guess checking the perf output would give you more clue, e.g. which function is hogging the CPU. Without such information, it's very hard to diagnose...
thanks,
Takashi
Hi Takashi,
Am Mi den 4. Dez 2019 um 17:31 schrieb Takashi Iwai:
I guess checking the perf output would give you more clue, e.g. which function is hogging the CPU. Without such information, it's very hard to diagnose...
No problem, I just had the problem today again.
Find attached the perf output of the relevant event handler.
Regards Klaus
On Sat, 07 Dec 2019 21:06:43 +0100, Klaus Ethgen wrote:
Hi Takashi,
Am Mi den 4. Dez 2019 um 17:31 schrieb Takashi Iwai:
I guess checking the perf output would give you more clue, e.g. which function is hogging the CPU. Without such information, it's very hard to diagnose...
No problem, I just had the problem today again.
Find attached the perf output of the relevant event handler.
Thanks. So it seems wasting in the busy loop for waiting for the response from the codec.
How about a patch below? It's untested, might break things, but at least avoiding a busy loop.
BTW, if your machine is Coffeelake or Cannonlake, this patch wont' work on the old branch because of forced polling mode. It was removed in 5.5-rc1, so please test with Linus tree at best.
Takashi
--- a/include/sound/hdaudio.h +++ b/include/sound/hdaudio.h @@ -317,6 +317,7 @@ struct hdac_bus { struct hdac_rb corb; struct hdac_rb rirb; unsigned int last_cmd[HDA_MAX_CODECS]; /* last sent command */ + wait_queue_head_t rirb_waitq;
/* CORB/RIRB and position buffers */ struct snd_dma_buffer rb; diff --git a/sound/hda/hdac_bus.c b/sound/hda/hdac_bus.c index 8f19876244eb..5eb58fce5353 100644 --- a/sound/hda/hdac_bus.c +++ b/sound/hda/hdac_bus.c @@ -43,6 +43,7 @@ int snd_hdac_bus_init(struct hdac_bus *bus, struct device *dev, mutex_init(&bus->cmd_mutex); mutex_init(&bus->lock); INIT_LIST_HEAD(&bus->hlink_list); + init_waitqueue_head(&bus->rirb_waitq); bus->irq = -1; return 0; } diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c index 7e7be8e4dcf9..cd8fa9aa599d 100644 --- a/sound/hda/hdac_controller.c +++ b/sound/hda/hdac_controller.c @@ -216,6 +216,9 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus) else if (bus->rirb.cmds[addr]) { bus->rirb.res[addr] = res; bus->rirb.cmds[addr]--; + if (!bus->rirb.cmds[addr] && + waitqueue_active(&bus->rirb_waitq)) + wake_up(&bus->rirb_waitq); } else { dev_err_ratelimited(bus->dev, "spurious response %#x:%#x, last cmd=%#08x\n", diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c index 2f3b7a35f2d9..56cafd12fa8a 100644 --- a/sound/pci/hda/hda_controller.c +++ b/sound/pci/hda/hda_controller.c @@ -792,21 +792,25 @@ static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr, struct hda_bus *hbus = &chip->bus; unsigned long timeout; unsigned long loopcounter; - int do_poll = 0; + wait_queue_entry_t wait; bool warned = false;
+ init_wait_entry(&wait, 0); again: timeout = jiffies + msecs_to_jiffies(1000);
for (loopcounter = 0;; loopcounter++) { spin_lock_irq(&bus->reg_lock); - if (bus->polling_mode || do_poll) + if (!bus->polling_mode) + prepare_to_wait(&bus->rirb_waitq, &wait, + TASK_UNINTERRUPTIBLE); + if (bus->polling_mode) snd_hdac_bus_update_rirb(bus); if (!bus->rirb.cmds[addr]) { - if (!do_poll) - bus->poll_count = 0; if (res) *res = bus->rirb.res[addr]; /* the last value */ + if (!bus->polling_mode) + finish_wait(&bus->rirb_waitq, &wait); spin_unlock_irq(&bus->reg_lock); return 0; } @@ -814,7 +818,9 @@ static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr, if (time_after(jiffies, timeout)) break; #define LOOP_COUNT_MAX 3000 - if (hbus->needs_damn_long_delay || + if (!bus->polling_mode) { + schedule_timeout(msecs_to_jiffies(2)); + } else if (hbus->needs_damn_long_delay || loopcounter > LOOP_COUNT_MAX) { if (loopcounter > LOOP_COUNT_MAX && !warned) { dev_dbg_ratelimited(chip->card->dev, @@ -829,19 +835,12 @@ static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr, } }
+ if (!bus->polling_mode) + finish_wait(&bus->rirb_waitq, &wait); + if (hbus->no_response_fallback) return -EIO;
- if (!bus->polling_mode && bus->poll_count < 2) { - dev_dbg(chip->card->dev, - "azx_get_response timeout, polling the codec once: last cmd=0x%08x\n", - bus->last_cmd[addr]); - do_poll = 1; - bus->poll_count++; - goto again; - } - - if (!bus->polling_mode) { dev_warn(chip->card->dev, "azx_get_response timeout, switching to polling mode: last cmd=0x%08x\n",
Hi Takashi,
Am So den 8. Dez 2019 um 9:45 schrieb Takashi Iwai:
On Sat, 07 Dec 2019 21:06:43 +0100,
Find attached the perf output of the relevant event handler.
Thanks. So it seems wasting in the busy loop for waiting for the response from the codec.
How about a patch below? It's untested, might break things, but at least avoiding a busy loop.
I'll give it a try. Kernel is compiling right now. It might be hard to see if the patch is working but I suppose that it do when I don't have that trouble within a week or so.
BTW, if your machine is Coffeelake or Cannonlake, this patch wont' work on the old branch because of forced polling mode. It was removed in 5.5-rc1, so please test with Linus tree at best.
I hardly have such new Architectures on that laptop. It is a lenovo x61s so it is from latest 2009. It has a Intel Core 2 Duo LV (Merom). :-)
I actually check it with gentoo kernel source 5.4.2.
Regards Klaus
On Sun, 08 Dec 2019 18:31:27 +0100, Klaus Ethgen wrote:
Hi Takashi,
Am So den 8. Dez 2019 um 9:45 schrieb Takashi Iwai:
On Sat, 07 Dec 2019 21:06:43 +0100,
Find attached the perf output of the relevant event handler.
Thanks. So it seems wasting in the busy loop for waiting for the response from the codec.
How about a patch below? It's untested, might break things, but at least avoiding a busy loop.
I'll give it a try. Kernel is compiling right now. It might be hard to see if the patch is working but I suppose that it do when I don't have that trouble within a week or so.
BTW, if your machine is Coffeelake or Cannonlake, this patch wont' work on the old branch because of forced polling mode. It was removed in 5.5-rc1, so please test with Linus tree at best.
I hardly have such new Architectures on that laptop. It is a lenovo x61s so it is from latest 2009. It has a Intel Core 2 Duo LV (Merom). :-)
OK, then could you give alsa-info.sh output, at least? Run the script with --no-upload option and attach the output.
Basically my patch should improve a lot for Nvidia codecs, not only about the CPU time reduction, and it'd be worth for merging to upstream. OTOH, it's a change at the very core part, so it needs more testing, so it's likely for 5.6.
thanks,
Takashi
Hi Takashi,
Am Mo den 9. Dez 2019 um 8:10 schrieb Takashi Iwai:
I hardly have such new Architectures on that laptop. It is a lenovo x61s so it is from latest 2009. It has a Intel Core 2 Duo LV (Merom). :-)
OK, then could you give alsa-info.sh output, at least? Run the script with --no-upload option and attach the output.
Sure. See attached output.
Regards Klaus
On Mon, 09 Dec 2019 09:57:16 +0100, Klaus Ethgen wrote:
Hi Takashi,
Am Mo den 9. Dez 2019 um 8:10 schrieb Takashi Iwai:
I hardly have such new Architectures on that laptop. It is a lenovo x61s so it is from latest 2009. It has a Intel Core 2 Duo LV (Merom). :-)
OK, then could you give alsa-info.sh output, at least? Run the script with --no-upload option and attach the output.
Sure. See attached output.
Thanks!
Maybe unrelated with this particular problem, but I wonder why you didn't enable CONFIG_SND_HDA_CODEC_ANALOG. You're using the generic codec driver and it misses a few vendor-specific fixes / workarounds.
Takashi
Hi Takashi,
Am Mo den 9. Dez 2019 um 10:27 schrieb Takashi Iwai:
Maybe unrelated with this particular problem, but I wonder why you didn't enable CONFIG_SND_HDA_CODEC_ANALOG. You're using the generic codec driver and it misses a few vendor-specific fixes / workarounds.
Lack of better knowledge. I'll add it to the next kernel build. I do not exactly know the concept of codecs.
I have to admit, that I just recently switched to alsa as I was using OSS4 before as I had many troubles with alsa in the past. (Kernel ops, localized libraries that didn't accept '.' as a decimal limiter, cracking sound when moving the mouse, ...)
So a compatibility issue on gentoo with the current oss4 package forced me to give alsa another try. And I have to say, it is matured now. Except this particular problem it seems to be stable.
Regards Klaus
Hi Takashi,
Unfortunately it is not successful. See the attached output of perf.
There is also a issue with a swapper thread in ath9k code but the one with the biggest CPU hook is still alsa.
Regards Klaus
On Tue, 10 Dec 2019 15:32:45 +0100, Klaus Ethgen wrote:
Hi Takashi,
Unfortunately it is not successful. See the attached output of perf.
There is also a issue with a swapper thread in ath9k code but the one with the biggest CPU hook is still alsa.
Do you see the relevant kernel messages, e.g. switch to polling mode? If that happens, it means that the communication with the chip got broken by some reason. And...
On Tue, 10 Dec 2019 15:35:44 +0100, Klaus Ethgen wrote:
Let me add that this time I encountered the issue after taking the laptop from the docking station.
... this can be the cause: the hardware doesn't react to the pin jack sense request for the dock that isn't present any longer. Or something goes wrong here, repeating, etc.
After sending it to sleep and waking it up again, the problem is gone (as it often can be solved).
So the problem is very likely some flaky hardware response.
Takashi
Hi Takashi,
Am Di den 10. Dez 2019 um 15:43 schrieb Takashi Iwai:
There is also a issue with a swapper thread in ath9k code but the one with the biggest CPU hook is still alsa.
Do you see the relevant kernel messages, e.g. switch to polling mode?
Nope.
Here is the relevant logs starting from undocking event up to the sleep that I mentioned fixing the issue: Dec 10 15:19:12 chua kernel: [14611.133847] ACPI: _SB_.GDCK: undocking Dec 10 15:19:12 chua kernel: [14611.137923] usb 1-4: USB disconnect, device number 5 Dec 10 15:19:13 chua kernel: [14611.360780] thinkpad_acpi: woke up due to a hot-unplug request... Dec 10 15:19:13 chua kernel: [14611.361034] thinkpad_acpi: undocked Dec 10 15:19:13 chua kernel: [14611.361509] ACPI: _SB_.GDCK: undocking Dec 10 15:19:16 chua kernel: [14614.688138] Monitor-Mwait will be used to enter C-3 state Dec 10 15:19:16 chua kernel: [14614.909277] sd 0:0:0:0: [sda] Synchronizing SCSI cache Dec 10 15:19:16 chua kernel: [14614.910426] sd 0:0:0:0: [sda] Stopping disk Dec 10 15:19:17 chua kernel: [14615.279585] e1000e: eth0 NIC Link is Down Dec 10 15:19:17 chua kernel: [14615.995311] sd 0:0:0:0: [sda] Starting disk Dec 10 15:19:18 chua kernel: [14616.840213] BTRFS warning (device dm-3): excessive commit interval 600 Dec 10 15:19:18 chua kernel: [14616.840217] BTRFS info (device dm-3): disk space caching is enabled [more of the last two lines] Dec 10 15:19:19 chua kernel: [14617.856624] e1000e: eth0 NIC Link is Down Dec 10 15:19:26 chua kernel: [14624.416815] wlan0: authenticate with 1a:e8:29:62:36:fc Dec 10 15:19:26 chua kernel: [14624.422409] wlan0: send auth to 1a:e8:29:62:36:fc (try 1/3) Dec 10 15:19:26 chua kernel: [14624.450142] wlan0: authenticated Dec 10 15:19:26 chua kernel: [14624.450480] wlan0: associate with 1a:e8:29:62:36:fc (try 1/3) Dec 10 15:19:26 chua kernel: [14624.455014] wlan0: RX AssocResp from 1a:e8:29:62:36:fc (capab=0x411 status=0 aid=3) Dec 10 15:19:26 chua kernel: [14624.455363] wlan0: associated Dec 10 15:19:26 chua kernel: [14624.460321] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready Dec 10 15:19:42 chua kernel: [14640.895731] e1000e: eth0 NIC Link is Down Dec 10 15:19:52 chua kernel: [14650.415684] e1000e: eth0 NIC Link is Down Dec 10 15:19:55 chua kernel: [14653.332765] wlan0: authenticate with 1a:e8:29:62:36:fc Dec 10 15:19:55 chua kernel: [14653.337337] wlan0: send auth to 1a:e8:29:62:36:fc (try 1/3) Dec 10 15:19:55 chua kernel: [14653.397026] wlan0: send auth to 1a:e8:29:62:36:fc (try 2/3) Dec 10 15:19:55 chua kernel: [14653.463283] wlan0: send auth to 1a:e8:29:62:36:fc (try 3/3) Dec 10 15:19:55 chua kernel: [14653.522603] wlan0: authentication with 1a:e8:29:62:36:fc timed out Dec 10 15:20:34 chua kernel: [14692.783662] e1000e: eth0 NIC Link is Down Dec 10 15:23:36 chua kernel: [14874.223723] perf: interrupt took too long (2597 > 2500), lowering kernel.perf_event_max_sample_rate to 77000 Dec 10 15:23:36 chua kernel: [14874.227937] perf: interrupt took too long (3335 > 3246), lowering kernel.perf_event_max_sample_rate to 59000 Dec 10 15:23:36 chua kernel: [14874.232928] perf: interrupt took too long (4169 > 4168), lowering kernel.perf_event_max_sample_rate to 47000 Dec 10 15:23:36 chua kernel: [14874.241582] perf: interrupt took too long (5233 > 5211), lowering kernel.perf_event_max_sample_rate to 38000 Dec 10 15:23:36 chua kernel: [14874.252231] perf: interrupt took too long (6548 > 6541), lowering kernel.perf_event_max_sample_rate to 30000 Dec 10 15:23:36 chua kernel: [14874.267034] perf: interrupt took too long (8211 > 8185), lowering kernel.perf_event_max_sample_rate to 24000 Dec 10 15:23:36 chua kernel: [14874.295426] perf: interrupt took too long (10276 > 10263), lowering kernel.perf_event_max_sample_rate to 19000 Dec 10 15:23:37 chua kernel: [14875.392080] perf: interrupt took too long (12851 > 12845), lowering kernel.perf_event_max_sample_rate to 15000 Dec 10 15:24:09 chua kernel: [14907.880874] e1000e: eth0 NIC Link is Down Dec 10 15:24:12 chua kernel: [14910.754897] wlan0: authenticate with 1a:e8:29:e1:1c:4b Dec 10 15:24:12 chua kernel: [14910.759170] wlan0: send auth to 1a:e8:29:e1:1c:4b (try 1/3) Dec 10 15:24:12 chua kernel: [14910.764932] wlan0: authenticated Dec 10 15:24:12 chua kernel: [14910.765477] wlan0: associate with 1a:e8:29:e1:1c:4b (try 1/3) Dec 10 15:24:12 chua kernel: [14910.778257] wlan0: RX AssocResp from 1a:e8:29:e1:1c:4b (capab=0x431 status=0 aid=8) Dec 10 15:24:12 chua kernel: [14910.778651] wlan0: associated Dec 10 15:24:12 chua kernel: [14910.785585] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready Dec 10 15:33:47 chua kernel: [15485.695970] wlan0: deauthenticating from 1a:e8:29:e1:1c:4b by local choice (Reason: 3=DEAUTH_LEAVING) Dec 10 15:33:48 chua kernel: [15486.359726] e1000e: eth0 NIC Link is Down Dec 10 15:33:49 chua kernel: [15487.208769] e1000e: eth0 NIC Link is Down Dec 10 15:33:49 chua kernel: [15487.717376] ath9k: ath9k: Driver unloaded Dec 10 15:33:50 chua kernel: [15488.357624] PM: suspend entry (deep) Dec 10 15:33:55 chua kernel: [15488.381517] Filesystems sync: 0.023 seconds Dec 10 15:33:55 chua kernel: [15488.381942] Freezing user space processes ... (elapsed 0.001 seconds) done. Dec 10 15:33:55 chua kernel: [15488.383328] OOM killer disabled. Dec 10 15:33:55 chua kernel: [15488.383329] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. Dec 10 15:33:55 chua kernel: [15488.384424] printk: Suspending console(s) (use no_console_suspend to debug) Dec 10 15:33:55 chua kernel: [15488.386563] sd 0:0:0:0: [sda] Synchronizing SCSI cache Dec 10 15:33:55 chua kernel: [15488.387436] sd 0:0:0:0: [sda] Stopping disk Dec 10 15:33:55 chua kernel: [15488.407871] serial 00:05: disabled Dec 10 15:33:55 chua kernel: [15488.409065] e1000e: EEE TX LPI TIMER: 00000000 Dec 10 15:33:55 chua kernel: [15488.497441] snd_hdac_bus_update_rirb: 58 callbacks suppressed Dec 10 15:33:55 chua kernel: [15488.497449] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497453] snd_hda_intel 0000:00:1b.0: spurious response 0x7fffffff:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497457] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497461] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497464] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497468] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497471] snd_hda_intel 0000:00:1b.0: spurious response 0x7fffffff:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497475] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497478] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497481] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.554013] ACPI: EC: interrupt blocked Dec 10 15:33:55 chua kernel: [15488.566977] ACPI: Preparing to enter system sleep state S3 Dec 10 15:33:55 chua kernel: [15488.583067] ACPI: EC: event blocked Dec 10 15:33:55 chua kernel: [15488.583069] ACPI: EC: EC stopped Dec 10 15:33:55 chua kernel: [15488.583070] PM: Saving platform NVS memory Dec 10 15:33:55 chua kernel: [15488.583426] Disabling non-boot CPUs ... Dec 10 15:33:55 chua kernel: [15488.584808] smpboot: CPU 1 is now offline
Let me add that this time I encountered the issue after taking the laptop from the docking station.
... this can be the cause: the hardware doesn't react to the pin jack sense request for the dock that isn't present any longer. Or something goes wrong here, repeating, etc.
I think, it might have to do with that as I often use the dock.
However, I think, I even seen the problem after I did a fresh boot far away from the dock.
After sending it to sleep and waking it up again, the problem is gone (as it often can be solved).
So the problem is very likely some flaky hardware response.
Anything you think can be done against that?
Regards Klaus
On Tue, 10 Dec 2019 16:10:37 +0100, Klaus Ethgen wrote:
Hi Takashi,
Am Di den 10. Dez 2019 um 15:43 schrieb Takashi Iwai:
There is also a issue with a swapper thread in ath9k code but the one with the biggest CPU hook is still alsa.
Do you see the relevant kernel messages, e.g. switch to polling mode?
Nope.
Here is the relevant logs starting from undocking event up to the sleep that I mentioned fixing the issue:
....
Dec 10 15:33:55 chua kernel: [15488.497441] snd_hdac_bus_update_rirb: 58 callbacks suppressed Dec 10 15:33:55 chua kernel: [15488.497449] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497453] snd_hda_intel 0000:00:1b.0: spurious response 0x7fffffff:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497457] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497461] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497464] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497468] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497471] snd_hda_intel 0000:00:1b.0: spurious response 0x7fffffff:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497475] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497478] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497481] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900
OK, these lines look already suspicious. The driver gets stuck at executing the same verb and repeated it until it gets some result. This verb is GET_PIN_SENSE to the pin 0x11, as I somehow expected. However, the pin 0x11 is the headphone pin, and this should work even after the undock, but by some reason, it screws up. Or it might be the read of the previous one (that doesn't appear here) that stalls the communication.
Let me add that this time I encountered the issue after taking the laptop from the docking station.
... this can be the cause: the hardware doesn't react to the pin jack sense request for the dock that isn't present any longer. Or something goes wrong here, repeating, etc.
I think, it might have to do with that as I often use the dock.
However, I think, I even seen the problem after I did a fresh boot far away from the dock.
Then it might be a problem of the headphone jack detection in general. It happens at undock because the jack detection is performed on all pins.
But, the same is true for the suspend/resume, even for runtime PM, so this might hit even more often. Hmm.
After sending it to sleep and waking it up again, the problem is gone (as it often can be solved).
So the problem is very likely some flaky hardware response.
Anything you think can be done against that?
Could you try the patch below in addition to the previous one? The similar trick was applied to the recent Intel chips for the unreliable communications, and the same might work for older chips, too.
thanks,
Takashi
--- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -282,7 +282,7 @@ enum {
/* quirks for old Intel chipsets */ #define AZX_DCAPS_INTEL_ICH \ - (AZX_DCAPS_OLD_SSYNC | AZX_DCAPS_NO_ALIGN_BUFSIZE) + (AZX_DCAPS_SYNC_WRITE | AZX_DCAPS_OLD_SSYNC | AZX_DCAPS_NO_ALIGN_BUFSIZE)
/* quirks for Intel PCH */ #define AZX_DCAPS_INTEL_PCH_BASE \
Hi Takashi,
Am Di den 10. Dez 2019 um 16:19 schrieb Takashi Iwai:
Dec 10 15:33:55 chua kernel: [15488.497441] snd_hdac_bus_update_rirb: 58 callbacks suppressed Dec 10 15:33:55 chua kernel: [15488.497449] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497453] snd_hda_intel 0000:00:1b.0: spurious response 0x7fffffff:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497457] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497461] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497464] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497468] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497471] snd_hda_intel 0000:00:1b.0: spurious response 0x7fffffff:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497475] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497478] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900 Dec 10 15:33:55 chua kernel: [15488.497481] snd_hda_intel 0000:00:1b.0: spurious response 0x0:0x0, last cmd=0x11f0900
OK, these lines look already suspicious.
These are logged after wakeup as at this position, most of the system seems already been sleeping. But it is really just before the sleep.
The driver gets stuck at executing the same verb and repeated it until it gets some result. This verb is GET_PIN_SENSE to the pin 0x11, as I somehow expected. However, the pin 0x11 is the headphone pin, and this should work even after the undock, but by some reason, it screws up. Or it might be the read of the previous one (that doesn't appear here) that stalls the communication.
[...]
Then it might be a problem of the headphone jack detection in general. It happens at undock because the jack detection is performed on all pins.
I think, I always seen the headphone pin note there. Is it possible to disable that detection? Maybe that is not working with that old hardware?
Anything you think can be done against that?
Could you try the patch below in addition to the previous one? The similar trick was applied to the recent Intel chips for the unreliable communications, and the same might work for older chips, too.
I'll do and by the same compilation, I'll enable CONFIG_SND_HDA_CODEC_ANALOG.
Regards Klaus
Hi Takashi,
Am Di den 10. Dez 2019 um 16:19 schrieb Takashi Iwai:
Could you try the patch below in addition to the previous one? The similar trick was applied to the recent Intel chips for the unreliable communications, and the same might work for older chips, too.
This patch seemed to have fixed the issue. I didn't see the problem again since.
Regards Klaus
On Mon, 23 Dec 2019 18:18:33 +0100, Klaus Ethgen wrote:
Hi Takashi,
Am Di den 10. Dez 2019 um 16:19 schrieb Takashi Iwai:
Could you try the patch below in addition to the previous one? The similar trick was applied to the recent Intel chips for the unreliable communications, and the same might work for older chips, too.
This patch seemed to have fixed the issue. I didn't see the problem again since.
OK, good to hear. I believe we should apply this workaround for the all Intel chips, then.
I'll submit the proper patch now.
thanks,
Takashi
Let me add that this time I encountered the issue after taking the laptop from the docking station.
After sending it to sleep and waking it up again, the problem is gone (as it often can be solved).
Regards Klaus
participants (2)
-
Klaus Ethgen
-
Takashi Iwai