Hi,
my desktop randomly experiences workqueue lockups on boot with openSUSE Tumbleweed kernels 4.9.x, installed around Christmas. Previously I had a (badly maintained) Gentoo installation with 4.4 IIRC, so I can't say if the kernel has regressed, or the major userspace changes exposed different timing of stuff.
This is how the workqueue lockup looks like:
kernel: BUG: workqueue lockup - pool cpus=3 node=0 flags=0x0 nice=0 stuck for 173s! kernel: Showing busy workqueues and worker pools: kernel: workqueue events: flags=0x0 kernel: pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=9/256 kernel: in-flight: 60:process_unsol_events [snd_hda_core] process_unsol_events [snd_hda_core] kernel: pending: dbs_work_handler, push_to_pool, vmstat_shepherd, cache_reap, console_callback, sysrq_reinject_alt_sysrq, check_corrup kernel: workqueue events_power_efficient: flags=0x80 kernel: pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256 kernel: pending: neigh_periodic_work kernel: workqueue lru-add-drain: flags=0x8 kernel: pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256 kernel: pending: lru_add_drain_per_cpu BAR(1161) kernel: workqueue vmstat: flags=0xc kernel: pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256 kernel: pending: vmstat_update kernel: workqueue ipv6_addrconf: flags=0x40008 kernel: pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/1 kernel: pending: addrconf_verify_work kernel: pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=173s workers=4 idle: 134 373 30
This prevented me from getting a login prompt, as systemd is waiting for wicked. I have used sysrq to list the stuck tasks:
kernel: sysrq: SysRq : Show Blocked State kernel: task PC stack pid father kernel: wickedd-nanny D 0 1161 1 0x00000000 kernel: ffff9f88e3ebc000 0000000000000000 ffff9f88e3506040 ffff9f88efd59780 kernel: ffff9f883ee90100 ffffb2dc81117d28 ffffffffa9715ff6 ffff9f88efd59780 kernel: 00ffb2dc81117d20 ffff9f88efd59780 ffff9f88def44000 ffff9f88e3506040 kernel: Call Trace: kernel: [<ffffffffa9715ff6>] ? __schedule+0x236/0x700 kernel: [<ffffffffa97164fd>] schedule+0x3d/0x90 kernel: [<ffffffffa971984e>] schedule_timeout+0x22e/0x410 kernel: [<ffffffffa90ab6e8>] ? finish_task_switch+0x78/0x1f0 kernel: [<ffffffffa90af94a>] ? try_to_wake_up+0x4a/0x3d0 kernel: [<ffffffffa9717837>] wait_for_completion+0x97/0x100 kernel: [<ffffffffa90afd70>] ? wake_up_q+0x80/0x80 kernel: [<ffffffffa909d747>] flush_work+0xf7/0x190 kernel: [<ffffffffa909b370>] ? flush_workqueue_prep_pwqs+0x1a0/0x1a0 kernel: [<ffffffffa91b4191>] lru_add_drain_all+0x121/0x160 kernel: [<ffffffffa91ddc15>] do_mlock+0x45/0x230 kernel: [<ffffffffa91ddf33>] SyS_mlock+0x13/0x20 kernel: [<ffffffffa971b47b>] entry_SYSCALL_64_fastpath+0x1e/0xad
Yep, stuck waiting for work item completion, but workqueue processing on cpu 3 is stuck (and contains such item).
Then I listed backtraces of all active CPU's:
kernel: sysrq: SysRq : Show backtrace of all active CPUs kernel: NMI backtrace for cpu 3 kernel: CPU: 3 PID: 60 Comm: kworker/3:1 Not tainted 4.9.3-1-default #1 kernel: Hardware name: Gigabyte Technology Co., Ltd. GA-870A-UD3/GA-870A-UD3, BIOS F5 08/01/2011 kernel: Workqueue: events process_unsol_events [snd_hda_core] kernel: ffff9f88efcc3a90 ffffffffa93c4f70 0000000000000000 0000000000000003 kernel: ffff9f88efcc3ac0 ffffffffa93c8e44 ffffffffa9055c80 0000000000000003 kernel: 0000000000000001 ffffffffa9cc43c0 ffff9f88efcc3ae0 ffffffffa93c8f66 kernel: Call Trace: kernel: <IRQ> kernel: [<ffffffffa93c4f70>] dump_stack+0x63/0x83 kernel: [<ffffffffa93c8e44>] nmi_cpu_backtrace+0x94/0xa0 kernel: [<ffffffffa9055c80>] ? irq_force_complete_move+0x150/0x150 kernel: [<ffffffffa93c8f66>] nmi_trigger_cpumask_backtrace+0x116/0x150 kernel: [<ffffffffa9055cf9>] arch_trigger_cpumask_backtrace+0x19/0x20 kernel: [<ffffffffa94ce877>] sysrq_handle_showallcpus+0x17/0x20 kernel: [<ffffffffa94cef1b>] __handle_sysrq+0xfb/0x150 [ snip keyboard event processing ] kernel: [<ffffffffa903128d>] handle_irq+0x1d/0x30 kernel: [<ffffffffa971ddab>] do_IRQ+0x4b/0xd0 kernel: [<ffffffffa971be02>] common_interrupt+0x82/0x82 kernel: <EOI> kernel: [<ffffffffa93d1ef3>] ? delay_tsc+0x43/0x90 kernel: [<ffffffffa93d1e2d>] __const_udelay+0x2d/0x30 kernel: [<ffffffffc0c08403>] azx_rirb_get_response+0xb3/0x280 [snd_hda_codec] kernel: [<ffffffffc0c08b43>] azx_get_response+0x33/0x40 [snd_hda_codec] kernel: [<ffffffffc0b86303>] snd_hdac_bus_exec_verb_unlocked+0x83/0x170 [snd_hda_core] kernel: [<ffffffffa952e85c>] ? __pm_runtime_resume+0x4c/0x60 kernel: [<ffffffffc0bfe4fa>] codec_exec_verb+0x8a/0x110 [snd_hda_codec] kernel: [<ffffffffc0b86f97>] snd_hdac_exec_verb+0x17/0x40 [snd_hda_core] kernel: [<ffffffffc0b87944>] snd_hdac_codec_read+0x34/0x50 [snd_hda_core] kernel: [<ffffffffc0c20501>] dspio_read+0x51/0x70 [snd_hda_codec_ca0132] kernel: [<ffffffffc0c20566>] ca0132_process_dsp_response+0x46/0x160 [snd_hda_codec_ca0132] kernel: [<ffffffffc0c02fe5>] call_jack_callback.isra.1+0x25/0xa0 [snd_hda_codec] kernel: [<ffffffffc0c033c6>] snd_hda_jack_unsol_event+0x66/0x80 [snd_hda_codec] kernel: [<ffffffffc0bfd077>] hda_codec_unsol_event+0x17/0x20 [snd_hda_codec] kernel: [<ffffffffc0b86193>] process_unsol_events+0x63/0x70 [snd_hda_core] kernel: [<ffffffffa909e4d3>] process_one_work+0x1f3/0x4d0 kernel: [<ffffffffa909e7f8>] worker_thread+0x48/0x4e0 kernel: [<ffffffffa909e7b0>] ? process_one_work+0x4d0/0x4d0 kernel: [<ffffffffa909e7b0>] ? process_one_work+0x4d0/0x4d0 kernel: [<ffffffffa90a46da>] kthread+0xca/0xe0 kernel: [<ffffffffa90a4610>] ? kthread_park+0x60/0x60 kernel: [<ffffffffa971b6f5>] ret_from_fork+0x25/0x30
This backtrace looks stable on multiple retries. I've rebooted and added module_blacklist=snd_hda_core on the command line to work around this for now. Here's the relevant lspci. The card I use right now is an onboard audio with 3 output jacks for 5.1. The Creative Audigy is currently unused, as should be the HDMI stuff.
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40) Subsystem: Gigabyte Technology Co., Ltd Device a102 Flags: bus master, slow devsel, latency 32, IRQ 7, NUMA node 0 Memory at fe024000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Kernel modules: snd_hda_intel
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] Subsystem: PC Partner Limited / Sapphire Technology Device aab0 Flags: bus master, fast devsel, latency 0, IRQ 10, NUMA node 0 Memory at fd4fc000 (64-bit, non-prefetchable) [size=16K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150] Advanced Error Reporting Kernel modules: snd_hda_intel
02:00.0 Audio device: Creative Labs Sound Core3D [Sound Blaster Recon3D / Z-Series] (rev 01) Subsystem: Creative Labs SB1570 SB Audigy Fx Flags: bus master, fast devsel, latency 0, IRQ 10, NUMA node 0 Memory at fd3fc000 (64-bit, non-prefetchable) [size=16K] Memory at fd3f8000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [170] Device Serial Number 00-00-00-00-00-00-00-00 Capabilities: [180] Power Budgeting <?> Kernel modules: snd_hda_intel
Let me know what else to dump or try. Thanks, Vlastimil