aconnect occasionally causes kernel oopses
[zo aug 1 20:24:51 2021] BUG: unable to handle page fault for address: 00003c366982f750 [zo aug 1 20:24:51 2021] #PF: supervisor read access in kernel mode [zo aug 1 20:24:51 2021] #PF: error_code(0x0000) - not-present page [zo aug 1 20:24:51 2021] PGD 0 P4D 0 [zo aug 1 20:24:51 2021] Oops: 0000 [#1] SMP NOPTI [zo aug 1 20:24:51 2021] CPU: 2 PID: 59575 Comm: aconnect Not tainted 5.11.0-25-generic #27-Ubuntu [zo aug 1 20:24:51 2021] Hardware name: HP HP Laptop 17-by2xxx/865A, BIOS F.58 06/16/2020 [zo aug 1 20:24:51 2021] RIP: 0010:check_and_subscribe_port+0x151/0x210 [snd_seq] [zo aug 1 20:24:51 2021] Code: 49 39 c5 0f 84 39 ff ff ff 41 0f b6 3e eb 0c 48 8b 00 49 39 c5 0f 84 27 ff ff ff 48 8d 70 b0 48 8d 48 a0 45 84 c0 48 0f 45 ce <40> 3a 39 75 e0 0f b6 51 01 41 38 56 01 75 d6 0f b6 51 02 41 38 56 [zo aug 1 20:24:51 2021] RSP: 0018:ffffa57dc9e03cf8 EFLAGS: 00010246 [zo aug 1 20:24:51 2021] RAX: 00003c366982f7b0 RBX: 0000000000000000 RCX: 00003c366982f750 [zo aug 1 20:24:51 2021] RDX: 0000000000000089 RSI: 00003c366982f760 RDI: 0000000000000080 [zo aug 1 20:24:51 2021] RBP: ffffa57dc9e03d38 R08: 0000000000000000 R09: ffff8945226312c0 [zo aug 1 20:24:51 2021] R10: ffff89458b7dda00 R11: ffff89460f08d270 R12: ffff894522631200 [zo aug 1 20:24:51 2021] R13: ffff8945226312c0 R14: ffff894506dd2080 R15: ffff8945226312d8 [zo aug 1 20:24:51 2021] FS: 00007f7fd9633480(0000) GS:ffff894756500000(0000) knlGS:0000000000000000 [zo aug 1 20:24:51 2021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [zo aug 1 20:24:51 2021] CR2: 00003c366982f750 CR3: 00000001d8318006 CR4: 00000000003706e0 [zo aug 1 20:24:51 2021] Call Trace: [zo aug 1 20:24:51 2021] snd_seq_port_connect+0x113/0x180 [snd_seq] [zo aug 1 20:24:51 2021] snd_seq_ioctl_subscribe_port+0xb9/0x180 [snd_seq] [zo aug 1 20:24:51 2021] ? snd_seq_port_get_subscription+0xbb/0xd0 [snd_seq] [zo aug 1 20:24:51 2021] ? __check_object_size.part.0+0x3a/0x150 [zo aug 1 20:24:51 2021] snd_seq_ioctl+0xe8/0x1b0 [snd_seq] [zo aug 1 20:24:51 2021] __x64_sys_ioctl+0x91/0xc0 [zo aug 1 20:24:51 2021] do_syscall_64+0x38/0x90 [zo aug 1 20:24:51 2021] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [zo aug 1 20:24:51 2021] RIP: 0033:0x7f7fd98b8ecb [zo aug 1 20:24:51 2021] Code: ff ff ff 85 c0 79 8b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d 1f 0d 00 f7 d8 64 89 01 48 [zo aug 1 20:24:51 2021] RSP: 002b:00007ffdc3e3f038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [zo aug 1 20:24:51 2021] RAX: ffffffffffffffda RBX: 00007f7fd998b354 RCX: 00007f7fd98b8ecb [zo aug 1 20:24:51 2021] RDX: 00007ffdc3e3f050 RSI: 0000000040505330 RDI: 0000000000000003 [zo aug 1 20:24:51 2021] RBP: 00007ffdc3e3f120 R08: 00000000ffffffff R09: 0000000000000000 [zo aug 1 20:24:51 2021] R10: 00007f7fd993dac0 R11: 0000000000000246 R12: 00007ffdc3e3f050 [zo aug 1 20:24:51 2021] R13: 00007ffdc3e3f0de R14: 0000000000000000 R15: 00007ffdc3e3f0dc [zo aug 1 20:24:51 2021] Modules linked in: snd_hrtimer snd_usb_audio snd_usbmidi_lib snd_seq_dummy ch341 usbserial ccm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables libcrc32c nfnetlink bridge stp llc rfcomm cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi snd_sof_pci snd_sof_intel_hda_common snd_hda_codec_realtek snd_soc_hdac_hda snd_hda_codec_generic snd_sof_intel_hda snd_sof_intel_byt snd_sof_intel_ipc snd_sof snd_sof_xtensa_dsp snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core snd_hwdep soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_pcm mei_hdcp rtw88_8723de rtw88_8723d snd_seq_midi x86_pkg_temp_thermal intel_powerclamp snd_seq_midi_event coretemp intel_rapl_msr rtw88_pci [zo aug 1 20:24:51 2021] kvm_intel snd_rawmidi rtw88_core kvm uvcvideo videobuf2_vmalloc rapl intel_cstate mac80211 videobuf2_memops snd_seq joydev btusb btrtl videobuf2_v4l2 snd_seq_device snd_timer btbcm input_leds btintel videobuf2_common videodev hp_wmi efi_pstore mc sparse_keymap serio_raw snd wmi_bmof intel_wmi_thunderbolt cfg80211 mei_me bluetooth soundcore ee1004 libarc4 processor_thermal_device processor_thermal_rfim processor_thermal_mbox mei ecdh_generic processor_thermal_rapl intel_rapl_common ecc intel_pch_thermal int340x_thermal_zone intel_soc_dts_iosf acpi_pad hp_wireless int3400_thermal acpi_tad acpi_thermal_rel mac_hid sch_fq_codel msr parport_pc ppdev lp parport sunrpc ip_tables x_tables autofs4 dm_crypt i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit ghash_clmulni_intel aesni_intel drm_kms_helper crypto_simd syscopyarea cryptd sysfillrect glue_helper sysimgblt fb_sys_fops cec psmouse nvme rc_core intel_lpss_pci intel_lpss nvme_core i2c_i801 r8169 idma64 i2c_smbus ahci realtek [zo aug 1 20:24:51 2021] libahci drm virt_dma xhci_pci xhci_pci_renesas wmi pinctrl_cannonlake video [zo aug 1 20:24:51 2021] CR2: 00003c366982f750 [zo aug 1 20:24:51 2021] ---[ end trace 000787e3cd09f39d ]--- [zo aug 1 20:24:52 2021] RIP: 0010:check_and_subscribe_port+0x151/0x210 [snd_seq] [zo aug 1 20:24:52 2021] Code: 49 39 c5 0f 84 39 ff ff ff 41 0f b6 3e eb 0c 48 8b 00 49 39 c5 0f 84 27 ff ff ff 48 8d 70 b0 48 8d 48 a0 45 84 c0 48 0f 45 ce <40> 3a 39 75 e0 0f b6 51 01 41 38 56 01 75 d6 0f b6 51 02 41 38 56 [zo aug 1 20:24:52 2021] RSP: 0018:ffffa57dc9e03cf8 EFLAGS: 00010246 [zo aug 1 20:24:52 2021] RAX: 00003c366982f7b0 RBX: 0000000000000000 RCX: 00003c366982f750 [zo aug 1 20:24:52 2021] RDX: 0000000000000089 RSI: 00003c366982f760 RDI: 0000000000000080 [zo aug 1 20:24:52 2021] RBP: ffffa57dc9e03d38 R08: 0000000000000000 R09: ffff8945226312c0 [zo aug 1 20:24:52 2021] R10: ffff89458b7dda00 R11: ffff89460f08d270 R12: ffff894522631200 [zo aug 1 20:24:52 2021] R13: ffff8945226312c0 R14: ffff894506dd2080 R15: ffff8945226312d8 [zo aug 1 20:24:52 2021] FS: 00007f7fd9633480(0000) GS:ffff894756500000(0000) knlGS:0000000000000000 [zo aug 1 20:24:52 2021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [zo aug 1 20:24:52 2021] CR2: 00003c366982f750 CR3: 00000001d8318006 CR4: 00000000003706e0
In which situation?
I was testing something that listens for alsa events and for that I ran a continuous loop that did:
while true do aconnect 128:1 14:0 aconnect -d 128:1 14:0 aconnect -d 128:2 128:1 aconnect 128:2 128:1 done
I ran 5 instances in parallel.
14 is midi through 128 is rtpmidi
On Mon, 02 Aug 2021 08:18:45 +0200, folkert wrote:
In which situation?
I was testing something that listens for alsa events and for that I ran a continuous loop that did:
while true do aconnect 128:1 14:0 aconnect -d 128:1 14:0 aconnect -d 128:2 128:1 aconnect 128:2 128:1 done
I ran 5 instances in parallel.
14 is midi through 128 is rtpmidi
So rtpmidi process keeps running during the loop, that is, it's only about connection and disconnection, right? Also, you're listening to an event during that -- but how?
In general when you report a bug, more detailed description about the test itself would be really helpful. At best, provide a PoC, which makes the debug much easier. Posting only the stack trace without any other information is, OTOH, almost pointless. One can't decode the address for a binary built for a different system easily, either.
thanks,
Takashi
In which situation?
I was testing something that listens for alsa events and for that I ran a continuous loop that did:
while true do aconnect 128:1 14:0 aconnect -d 128:1 14:0 aconnect -d 128:2 128:1 aconnect 128:2 128:1 done
I ran 5 instances in parallel.
14 is midi through 128 is rtpmidi
So rtpmidi process keeps running during the loop, that is, it's only about connection and disconnection, right? Also, you're listening to an event during that -- but how?
I tried it again but with a simpler setup:
I've got these devices:
root@lappiemctopface:~# aplaymidi -l Port Client name Port name 14:0 Midi Through Midi Through Port-0 130:0 FLUID Synth (17032) Synth input port (17032:0) 131:0 VMPK Input in root@lappiemctopface:~# arecordmidi -l Port Client name Port name 14:0 Midi Through Midi Through Port-0 132:0 VMPK Output out
I run this in 3x parallel:
while true do aconnect 132:0 130:0 aconnect -d 132:0 130:0 done
and then in less than a minute I get a backtrace.
[ma aug 2 11:05:13 2021] Call Trace: [ma aug 2 11:05:13 2021] ? snd_seq_deliver_event+0x38/0x90 [snd_seq] [ma aug 2 11:05:13 2021] ? snd_seq_kernel_client_dispatch+0x72/0x90 [snd_seq] [ma aug 2 11:05:13 2021] kfree+0x3bc/0x3e0 [ma aug 2 11:05:13 2021] ? snd_seq_port_disconnect+0x10c/0x140 [snd_seq] [ma aug 2 11:05:13 2021] snd_seq_port_disconnect+0x10c/0x140 [snd_seq] [ma aug 2 11:05:13 2021] snd_seq_ioctl_unsubscribe_port+0xb9/0x180 [snd_seq] [ma aug 2 11:05:13 2021] ? snd_seq_port_get_subscription+0xbb/0xd0 [snd_seq] [ma aug 2 11:05:13 2021] ? __check_object_size.part.0+0x3a/0x150 [ma aug 2 11:05:13 2021] snd_seq_ioctl+0xe8/0x1b0 [snd_seq] [ma aug 2 11:05:13 2021] __x64_sys_ioctl+0x91/0xc0 [ma aug 2 11:05:13 2021] do_syscall_64+0x38/0x90 [ma aug 2 11:05:13 2021] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Using: fluidsynth 2.1.7-1 vmpk 0.7.2-1build1
On: 5.11.0-25-generic (ubuntu) on 2 cores, 2 threads/core cpu (64b).
On Mon, 02 Aug 2021 11:10:12 +0200, folkert wrote:
In which situation?
I was testing something that listens for alsa events and for that I ran a continuous loop that did:
while true do aconnect 128:1 14:0 aconnect -d 128:1 14:0 aconnect -d 128:2 128:1 aconnect 128:2 128:1 done
I ran 5 instances in parallel.
14 is midi through 128 is rtpmidi
So rtpmidi process keeps running during the loop, that is, it's only about connection and disconnection, right? Also, you're listening to an event during that -- but how?
I tried it again but with a simpler setup:
I've got these devices:
root@lappiemctopface:~# aplaymidi -l Port Client name Port name 14:0 Midi Through Midi Through Port-0 130:0 FLUID Synth (17032) Synth input port (17032:0) 131:0 VMPK Input in root@lappiemctopface:~# arecordmidi -l Port Client name Port name 14:0 Midi Through Midi Through Port-0 132:0 VMPK Output out
I run this in 3x parallel:
while true do aconnect 132:0 130:0 aconnect -d 132:0 130:0 done
and then in less than a minute I get a backtrace.
OK, thanks. That's more promising.
Does this happen if you do reconnect of kernel sequencer client? You can use snd-virmidi as well as snd-dummy.
I'm asking it because it'll simplify the test a lot, which will be almost self-contained.
Takashi
[ kernel bug if repeatingly aconnect'ing midi devices ]
Does this happen if you do reconnect of kernel sequencer client? You can use snd-virmidi as well as snd-dummy. I'm asking it because it'll simplify the test a lot, which will be almost self-contained.
Like this?
root@lappiemctopface:~# aplaymidi -l Port Client name Port name 14:0 Midi Through Midi Through Port-0 20:0 Virtual Raw MIDI 1-0 VirMIDI 1-0 21:0 Virtual Raw MIDI 1-1 VirMIDI 1-1 22:0 Virtual Raw MIDI 1-2 VirMIDI 1-2 23:0 Virtual Raw MIDI 1-3 VirMIDI 1-3 128:0 rtpmidi lappiemctopface Network 128:1 rtpmidi lappiemctopface metronoom 128:2 rtpmidi lappiemctopface AppleMidi2IPMidiBridge 128:3 rtpmidi lappiemctopface oensoens 130:0 FLUID Synth (11462) Synth input port (11462:0)
and then:
root@lappiemctopface:~# cat test.sh while true do aconnect 20:0 21:0 aconnect -d 20:0 21:0 done
root@lappiemctopface:~# for i in `seq 0 3` ; do (./test.sh &) ; done
This hard locks-up my laptop: it doesn't even respond to capslock (led on/off) anymore nor the ctrl+prtscr+alt+b combination.
On Mon, 02 Aug 2021 17:21:17 +0200, folkert wrote:
[ kernel bug if repeatingly aconnect'ing midi devices ]
Does this happen if you do reconnect of kernel sequencer client? You can use snd-virmidi as well as snd-dummy. I'm asking it because it'll simplify the test a lot, which will be almost self-contained.
Like this?
root@lappiemctopface:~# aplaymidi -l Port Client name Port name 14:0 Midi Through Midi Through Port-0 20:0 Virtual Raw MIDI 1-0 VirMIDI 1-0 21:0 Virtual Raw MIDI 1-1 VirMIDI 1-1 22:0 Virtual Raw MIDI 1-2 VirMIDI 1-2 23:0 Virtual Raw MIDI 1-3 VirMIDI 1-3 128:0 rtpmidi lappiemctopface Network 128:1 rtpmidi lappiemctopface metronoom 128:2 rtpmidi lappiemctopface AppleMidi2IPMidiBridge 128:3 rtpmidi lappiemctopface oensoens 130:0 FLUID Synth (11462) Synth input port (11462:0)
and then:
root@lappiemctopface:~# cat test.sh while true do aconnect 20:0 21:0 aconnect -d 20:0 21:0 done
root@lappiemctopface:~# for i in `seq 0 3` ; do (./test.sh &) ; done
This hard locks-up my laptop: it doesn't even respond to capslock (led on/off) anymore nor the ctrl+prtscr+alt+b combination.
Thanks, that's really helpful! I see the possible race now.
Could you try the quick fix below?
Takashi
--- --- a/sound/core/seq/seq_ports.c +++ b/sound/core/seq/seq_ports.c @@ -200,6 +200,17 @@ get_subscriber(struct list_head *p, bool is_src) return list_entry(p, struct snd_seq_subscribers, dest_list); }
+static void subscriber_get(struct snd_seq_subscribers *subs) +{ + atomic_inc(&subs->ref_count); +} + +static void subscriber_put(struct snd_seq_subscribers *subs) +{ + if (atomic_dec_and_test(&subs->ref_count)) + kfree(subs); +} + /* * remove all subscribers on the list * this is called from port_delete, for each src and dest list. @@ -228,8 +239,7 @@ static void clear_subscriber_list(struct snd_seq_client *client, * we decrease the counter, and when both ports are deleted * remove the subscriber info */ - if (atomic_dec_and_test(&subs->ref_count)) - kfree(subs); + subscriber_put(subs); continue; }
@@ -489,6 +499,8 @@ static int check_and_subscribe_port(struct snd_seq_client *client, s = get_subscriber(p, is_src); if (match_subs_info(&subs->info, &s->info)) goto __error; + if (!subs->ready) + goto __error; } }
@@ -505,7 +517,7 @@ static int check_and_subscribe_port(struct snd_seq_client *client, else list_add_tail(&subs->dest_list, &grp->list_head); grp->exclusive = exclusive; - atomic_inc(&subs->ref_count); + subscriber_get(subs); write_unlock_irq(&grp->list_lock); err = 0;
@@ -536,6 +548,7 @@ static void delete_and_unsubscribe_port(struct snd_seq_client *client, if (!empty) unsubscribe_port(client, port, grp, &subs->info, ack); up_write(&grp->list_mutex); + subscriber_put(subs); }
/* connect two ports */ @@ -555,7 +568,7 @@ int snd_seq_port_connect(struct snd_seq_client *connector, return -ENOMEM;
subs->info = *info; - atomic_set(&subs->ref_count, 0); + atomic_set(&subs->ref_count, 1); INIT_LIST_HEAD(&subs->src_list); INIT_LIST_HEAD(&subs->dest_list);
@@ -572,13 +585,14 @@ int snd_seq_port_connect(struct snd_seq_client *connector, if (err < 0) goto error_dest;
+ subs->ready = true; return 0;
error_dest: delete_and_unsubscribe_port(src_client, src_port, subs, true, connector->number != src_client->number); error: - kfree(subs); + subscriber_put(subs); return err; }
@@ -597,8 +611,8 @@ int snd_seq_port_disconnect(struct snd_seq_client *connector, down_write(&src->list_mutex); /* look for the connection */ list_for_each_entry(subs, &src->list_head, src_list) { - if (match_subs_info(info, &subs->info)) { - atomic_dec(&subs->ref_count); /* mark as not ready */ + if (match_subs_info(info, &subs->info) && subs->ready) { + subs->ready = false; err = 0; break; } @@ -611,7 +625,7 @@ int snd_seq_port_disconnect(struct snd_seq_client *connector, connector->number != src_client->number); delete_and_unsubscribe_port(dest_client, dest_port, subs, false, connector->number != dest_client->number); - kfree(subs); + subscriber_put(subs); return 0; }
diff --git a/sound/core/seq/seq_ports.h b/sound/core/seq/seq_ports.h index b1f2c4943174..d9644d5f109e 100644 --- a/sound/core/seq/seq_ports.h +++ b/sound/core/seq/seq_ports.h @@ -30,6 +30,7 @@ struct snd_seq_subscribers { struct list_head src_list; /* link of sources */ struct list_head dest_list; /* link of destinations */ atomic_t ref_count; + bool ready; };
struct snd_seq_port_subs_info {
[ kernel bug if repeatingly aconnect'ing midi devices ]
Does this happen if you do reconnect of kernel sequencer client? You can use snd-virmidi as well as snd-dummy. I'm asking it because it'll simplify the test a lot, which will be almost self-contained.
Like this?
root@lappiemctopface:~# aplaymidi -l Port Client name Port name 14:0 Midi Through Midi Through Port-0 20:0 Virtual Raw MIDI 1-0 VirMIDI 1-0 21:0 Virtual Raw MIDI 1-1 VirMIDI 1-1 22:0 Virtual Raw MIDI 1-2 VirMIDI 1-2 23:0 Virtual Raw MIDI 1-3 VirMIDI 1-3 128:0 rtpmidi lappiemctopface Network 128:1 rtpmidi lappiemctopface metronoom 128:2 rtpmidi lappiemctopface AppleMidi2IPMidiBridge 128:3 rtpmidi lappiemctopface oensoens 130:0 FLUID Synth (11462) Synth input port (11462:0)
and then:
root@lappiemctopface:~# cat test.sh while true do aconnect 20:0 21:0 aconnect -d 20:0 21:0 done
root@lappiemctopface:~# for i in `seq 0 3` ; do (./test.sh &) ; done
This hard locks-up my laptop: it doesn't even respond to capslock (led on/off) anymore nor the ctrl+prtscr+alt+b combination.
Thanks, that's really helpful! I see the possible race now.
Could you try the quick fix below?
Something may have changed:
folkert@oensoens:~$ aplaymidi -l ALSA lib seq_hw.c:466:(snd_seq_hw_open) open /dev/snd/seq failed: Permission denied Cannot open sequencer - Permission denied
???
Otoh it now runs fine from what I've could test.
On Mon, 02 Aug 2021 21:53:49 +0200, folkert wrote:
[ kernel bug if repeatingly aconnect'ing midi devices ]
Does this happen if you do reconnect of kernel sequencer client? You can use snd-virmidi as well as snd-dummy. I'm asking it because it'll simplify the test a lot, which will be almost self-contained.
Like this?
root@lappiemctopface:~# aplaymidi -l Port Client name Port name 14:0 Midi Through Midi Through Port-0 20:0 Virtual Raw MIDI 1-0 VirMIDI 1-0 21:0 Virtual Raw MIDI 1-1 VirMIDI 1-1 22:0 Virtual Raw MIDI 1-2 VirMIDI 1-2 23:0 Virtual Raw MIDI 1-3 VirMIDI 1-3 128:0 rtpmidi lappiemctopface Network 128:1 rtpmidi lappiemctopface metronoom 128:2 rtpmidi lappiemctopface AppleMidi2IPMidiBridge 128:3 rtpmidi lappiemctopface oensoens 130:0 FLUID Synth (11462) Synth input port (11462:0)
and then:
root@lappiemctopface:~# cat test.sh while true do aconnect 20:0 21:0 aconnect -d 20:0 21:0 done
root@lappiemctopface:~# for i in `seq 0 3` ; do (./test.sh &) ; done
This hard locks-up my laptop: it doesn't even respond to capslock (led on/off) anymore nor the ctrl+prtscr+alt+b combination.
Thanks, that's really helpful! I see the possible race now.
Could you try the quick fix below?
Something may have changed:
folkert@oensoens:~$ aplaymidi -l ALSA lib seq_hw.c:466:(snd_seq_hw_open) open /dev/snd/seq failed: Permission denied Cannot open sequencer - Permission denied
???
Hrm, that's unexpected.
Meanwhile, I reconsidered the fix and a better idea came after the sleep. Below is the new fix patch. Could you give it a try instead of the previous one?
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: seq: Fix racy deletion of subscriber
It turned out that the current implementation of the port subscription is racy. The subscription contains two linked lists, and we have to add to or delete from both lists. Since both connection and disconnection procedures perform the same order for those two lists (i.e. src list, then dest list), when a deletion happens during a connection procedure, the src list may be deleted before the dest list addition completes, and this may lead to a use-after-free or an Oops, even though the access to both lists are protected via mutex.
The simple workaround for this race is to change the access order for the disconnection, namely, dest list, then src list. This assures that the connection has been established when disconnecting, and also the concurrent deletion can be avoided.
Reported-by: folkert folkert@vanheusden.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210801182754.GP890690@belle.intranet.vanheusden.... Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/core/seq/seq_ports.c | 39 ++++++++++++++++++++++++++------------ 1 file changed, 27 insertions(+), 12 deletions(-)
diff --git a/sound/core/seq/seq_ports.c b/sound/core/seq/seq_ports.c index b9c2ce2b8d5a..84d78630463e 100644 --- a/sound/core/seq/seq_ports.c +++ b/sound/core/seq/seq_ports.c @@ -514,10 +514,11 @@ static int check_and_subscribe_port(struct snd_seq_client *client, return err; }
-static void delete_and_unsubscribe_port(struct snd_seq_client *client, - struct snd_seq_client_port *port, - struct snd_seq_subscribers *subs, - bool is_src, bool ack) +/* called with grp->list_mutex held */ +static void __delete_and_unsubscribe_port(struct snd_seq_client *client, + struct snd_seq_client_port *port, + struct snd_seq_subscribers *subs, + bool is_src, bool ack) { struct snd_seq_port_subs_info *grp; struct list_head *list; @@ -525,7 +526,6 @@ static void delete_and_unsubscribe_port(struct snd_seq_client *client,
grp = is_src ? &port->c_src : &port->c_dest; list = is_src ? &subs->src_list : &subs->dest_list; - down_write(&grp->list_mutex); write_lock_irq(&grp->list_lock); empty = list_empty(list); if (!empty) @@ -535,6 +535,18 @@ static void delete_and_unsubscribe_port(struct snd_seq_client *client,
if (!empty) unsubscribe_port(client, port, grp, &subs->info, ack); +} + +static void delete_and_unsubscribe_port(struct snd_seq_client *client, + struct snd_seq_client_port *port, + struct snd_seq_subscribers *subs, + bool is_src, bool ack) +{ + struct snd_seq_port_subs_info *grp; + + grp = is_src ? &port->c_src : &port->c_dest; + down_write(&grp->list_mutex); + __delete_and_unsubscribe_port(client, port, subs, is_src, ack); up_write(&grp->list_mutex); }
@@ -590,27 +602,30 @@ int snd_seq_port_disconnect(struct snd_seq_client *connector, struct snd_seq_client_port *dest_port, struct snd_seq_port_subscribe *info) { - struct snd_seq_port_subs_info *src = &src_port->c_src; + struct snd_seq_port_subs_info *dest = &dest_port->c_dest; struct snd_seq_subscribers *subs; int err = -ENOENT;
- down_write(&src->list_mutex); + /* always start from deleting the dest port for avoiding concurrent + * deletions + */ + down_write(&dest->list_mutex); /* look for the connection */ - list_for_each_entry(subs, &src->list_head, src_list) { + list_for_each_entry(subs, &dest->list_head, dest_list) { if (match_subs_info(info, &subs->info)) { - atomic_dec(&subs->ref_count); /* mark as not ready */ + __delete_and_unsubscribe_port(dest_client, dest_port, + subs, false, + connector->number != dest_client->number); err = 0; break; } } - up_write(&src->list_mutex); + up_write(&dest->list_mutex); if (err < 0) return err;
delete_and_unsubscribe_port(src_client, src_port, subs, true, connector->number != src_client->number); - delete_and_unsubscribe_port(dest_client, dest_port, subs, false, - connector->number != dest_client->number); kfree(subs); return 0; }
Hi,
To which kernel version should I apply it? Because:
folkert@oensoens:~/linux-5.11.0$ patch -p1 < ~/alsa-patch.diff patching file sound/core/seq/seq_ports.c patching file sound/core/seq/seq_ports.h patching file sound/core/seq/seq_ports.c Hunk #1 succeeded at 526 (offset 12 lines). Hunk #2 succeeded at 538 (offset 12 lines). Hunk #3 succeeded at 547 with fuzz 2 (offset 12 lines). Hunk #4 FAILED at 602. 1 out of 4 hunks FAILED -- saving rejects to file sound/core/seq/seq_ports.c.rej
On Tue, Aug 03, 2021 at 08:55:36AM +0200, Takashi Iwai wrote:
On Mon, 02 Aug 2021 21:53:49 +0200, folkert wrote:
[ kernel bug if repeatingly aconnect'ing midi devices ]
Does this happen if you do reconnect of kernel sequencer client? You can use snd-virmidi as well as snd-dummy. I'm asking it because it'll simplify the test a lot, which will be almost self-contained.
Like this?
root@lappiemctopface:~# aplaymidi -l Port Client name Port name 14:0 Midi Through Midi Through Port-0 20:0 Virtual Raw MIDI 1-0 VirMIDI 1-0 21:0 Virtual Raw MIDI 1-1 VirMIDI 1-1 22:0 Virtual Raw MIDI 1-2 VirMIDI 1-2 23:0 Virtual Raw MIDI 1-3 VirMIDI 1-3 128:0 rtpmidi lappiemctopface Network 128:1 rtpmidi lappiemctopface metronoom 128:2 rtpmidi lappiemctopface AppleMidi2IPMidiBridge 128:3 rtpmidi lappiemctopface oensoens 130:0 FLUID Synth (11462) Synth input port (11462:0)
and then:
root@lappiemctopface:~# cat test.sh while true do aconnect 20:0 21:0 aconnect -d 20:0 21:0 done
root@lappiemctopface:~# for i in `seq 0 3` ; do (./test.sh &) ; done
This hard locks-up my laptop: it doesn't even respond to capslock (led on/off) anymore nor the ctrl+prtscr+alt+b combination.
Thanks, that's really helpful! I see the possible race now.
Could you try the quick fix below?
Something may have changed:
folkert@oensoens:~$ aplaymidi -l ALSA lib seq_hw.c:466:(snd_seq_hw_open) open /dev/snd/seq failed: Permission denied Cannot open sequencer - Permission denied
???
Hrm, that's unexpected.
Meanwhile, I reconsidered the fix and a better idea came after the sleep. Below is the new fix patch. Could you give it a try instead of the previous one?
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: seq: Fix racy deletion of subscriber
It turned out that the current implementation of the port subscription is racy. The subscription contains two linked lists, and we have to add to or delete from both lists. Since both connection and disconnection procedures perform the same order for those two lists (i.e. src list, then dest list), when a deletion happens during a connection procedure, the src list may be deleted before the dest list addition completes, and this may lead to a use-after-free or an Oops, even though the access to both lists are protected via mutex.
The simple workaround for this race is to change the access order for the disconnection, namely, dest list, then src list. This assures that the connection has been established when disconnecting, and also the concurrent deletion can be avoided.
Reported-by: folkert folkert@vanheusden.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210801182754.GP890690@belle.intranet.vanheusden.... Signed-off-by: Takashi Iwai tiwai@suse.de
sound/core/seq/seq_ports.c | 39 ++++++++++++++++++++++++++------------ 1 file changed, 27 insertions(+), 12 deletions(-)
diff --git a/sound/core/seq/seq_ports.c b/sound/core/seq/seq_ports.c index b9c2ce2b8d5a..84d78630463e 100644 --- a/sound/core/seq/seq_ports.c +++ b/sound/core/seq/seq_ports.c @@ -514,10 +514,11 @@ static int check_and_subscribe_port(struct snd_seq_client *client, return err; }
-static void delete_and_unsubscribe_port(struct snd_seq_client *client,
struct snd_seq_client_port *port,
struct snd_seq_subscribers *subs,
bool is_src, bool ack)
+/* called with grp->list_mutex held */ +static void __delete_and_unsubscribe_port(struct snd_seq_client *client,
struct snd_seq_client_port *port,
struct snd_seq_subscribers *subs,
bool is_src, bool ack)
{ struct snd_seq_port_subs_info *grp; struct list_head *list; @@ -525,7 +526,6 @@ static void delete_and_unsubscribe_port(struct snd_seq_client *client,
grp = is_src ? &port->c_src : &port->c_dest; list = is_src ? &subs->src_list : &subs->dest_list;
- down_write(&grp->list_mutex); write_lock_irq(&grp->list_lock); empty = list_empty(list); if (!empty)
@@ -535,6 +535,18 @@ static void delete_and_unsubscribe_port(struct snd_seq_client *client,
if (!empty) unsubscribe_port(client, port, grp, &subs->info, ack); +}
+static void delete_and_unsubscribe_port(struct snd_seq_client *client,
struct snd_seq_client_port *port,
struct snd_seq_subscribers *subs,
bool is_src, bool ack)
+{
- struct snd_seq_port_subs_info *grp;
- grp = is_src ? &port->c_src : &port->c_dest;
- down_write(&grp->list_mutex);
- __delete_and_unsubscribe_port(client, port, subs, is_src, ack); up_write(&grp->list_mutex);
}
@@ -590,27 +602,30 @@ int snd_seq_port_disconnect(struct snd_seq_client *connector, struct snd_seq_client_port *dest_port, struct snd_seq_port_subscribe *info) {
- struct snd_seq_port_subs_info *src = &src_port->c_src;
- struct snd_seq_port_subs_info *dest = &dest_port->c_dest; struct snd_seq_subscribers *subs; int err = -ENOENT;
- down_write(&src->list_mutex);
- /* always start from deleting the dest port for avoiding concurrent
* deletions
*/
- down_write(&dest->list_mutex); /* look for the connection */
- list_for_each_entry(subs, &src->list_head, src_list) {
- list_for_each_entry(subs, &dest->list_head, dest_list) { if (match_subs_info(info, &subs->info)) {
atomic_dec(&subs->ref_count); /* mark as not ready */
__delete_and_unsubscribe_port(dest_client, dest_port,
subs, false,
} }connector->number != dest_client->number); err = 0; break;
- up_write(&src->list_mutex);
up_write(&dest->list_mutex); if (err < 0) return err;
delete_and_unsubscribe_port(src_client, src_port, subs, true, connector->number != src_client->number);
- delete_and_unsubscribe_port(dest_client, dest_port, subs, false,
kfree(subs); return 0;connector->number != dest_client->number);
}
2.26.2
Folkert van Heusden
On Tue, 03 Aug 2021 09:40:50 +0200, folkert wrote:
Hi,
To which kernel version should I apply it? Because:
folkert@oensoens:~/linux-5.11.0$ patch -p1 < ~/alsa-patch.diff patching file sound/core/seq/seq_ports.c patching file sound/core/seq/seq_ports.h patching file sound/core/seq/seq_ports.c Hunk #1 succeeded at 526 (offset 12 lines). Hunk #2 succeeded at 538 (offset 12 lines). Hunk #3 succeeded at 547 with fuzz 2 (offset 12 lines). Hunk #4 FAILED at 602. 1 out of 4 hunks FAILED -- saving rejects to file sound/core/seq/seq_ports.c.rej
Did you scratch the previous patch? AFAIK, the latest patch should be applicable to 5.11 and older cleanly.
Takashi
On Tue, Aug 03, 2021 at 08:55:36AM +0200, Takashi Iwai wrote:
On Mon, 02 Aug 2021 21:53:49 +0200, folkert wrote:
[ kernel bug if repeatingly aconnect'ing midi devices ]
Does this happen if you do reconnect of kernel sequencer client? You can use snd-virmidi as well as snd-dummy. I'm asking it because it'll simplify the test a lot, which will be almost self-contained.
Like this?
root@lappiemctopface:~# aplaymidi -l Port Client name Port name 14:0 Midi Through Midi Through Port-0 20:0 Virtual Raw MIDI 1-0 VirMIDI 1-0 21:0 Virtual Raw MIDI 1-1 VirMIDI 1-1 22:0 Virtual Raw MIDI 1-2 VirMIDI 1-2 23:0 Virtual Raw MIDI 1-3 VirMIDI 1-3 128:0 rtpmidi lappiemctopface Network 128:1 rtpmidi lappiemctopface metronoom 128:2 rtpmidi lappiemctopface AppleMidi2IPMidiBridge 128:3 rtpmidi lappiemctopface oensoens 130:0 FLUID Synth (11462) Synth input port (11462:0)
and then:
root@lappiemctopface:~# cat test.sh while true do aconnect 20:0 21:0 aconnect -d 20:0 21:0 done
root@lappiemctopface:~# for i in `seq 0 3` ; do (./test.sh &) ; done
This hard locks-up my laptop: it doesn't even respond to capslock (led on/off) anymore nor the ctrl+prtscr+alt+b combination.
Thanks, that's really helpful! I see the possible race now.
Could you try the quick fix below?
Something may have changed:
folkert@oensoens:~$ aplaymidi -l ALSA lib seq_hw.c:466:(snd_seq_hw_open) open /dev/snd/seq failed: Permission denied Cannot open sequencer - Permission denied
???
Hrm, that's unexpected.
Meanwhile, I reconsidered the fix and a better idea came after the sleep. Below is the new fix patch. Could you give it a try instead of the previous one?
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: seq: Fix racy deletion of subscriber
It turned out that the current implementation of the port subscription is racy. The subscription contains two linked lists, and we have to add to or delete from both lists. Since both connection and disconnection procedures perform the same order for those two lists (i.e. src list, then dest list), when a deletion happens during a connection procedure, the src list may be deleted before the dest list addition completes, and this may lead to a use-after-free or an Oops, even though the access to both lists are protected via mutex.
The simple workaround for this race is to change the access order for the disconnection, namely, dest list, then src list. This assures that the connection has been established when disconnecting, and also the concurrent deletion can be avoided.
Reported-by: folkert folkert@vanheusden.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210801182754.GP890690@belle.intranet.vanheusden.... Signed-off-by: Takashi Iwai tiwai@suse.de
sound/core/seq/seq_ports.c | 39 ++++++++++++++++++++++++++------------ 1 file changed, 27 insertions(+), 12 deletions(-)
diff --git a/sound/core/seq/seq_ports.c b/sound/core/seq/seq_ports.c index b9c2ce2b8d5a..84d78630463e 100644 --- a/sound/core/seq/seq_ports.c +++ b/sound/core/seq/seq_ports.c @@ -514,10 +514,11 @@ static int check_and_subscribe_port(struct snd_seq_client *client, return err; }
-static void delete_and_unsubscribe_port(struct snd_seq_client *client,
struct snd_seq_client_port *port,
struct snd_seq_subscribers *subs,
bool is_src, bool ack)
+/* called with grp->list_mutex held */ +static void __delete_and_unsubscribe_port(struct snd_seq_client *client,
struct snd_seq_client_port *port,
struct snd_seq_subscribers *subs,
bool is_src, bool ack)
{ struct snd_seq_port_subs_info *grp; struct list_head *list; @@ -525,7 +526,6 @@ static void delete_and_unsubscribe_port(struct snd_seq_client *client,
grp = is_src ? &port->c_src : &port->c_dest; list = is_src ? &subs->src_list : &subs->dest_list;
- down_write(&grp->list_mutex); write_lock_irq(&grp->list_lock); empty = list_empty(list); if (!empty)
@@ -535,6 +535,18 @@ static void delete_and_unsubscribe_port(struct snd_seq_client *client,
if (!empty) unsubscribe_port(client, port, grp, &subs->info, ack); +}
+static void delete_and_unsubscribe_port(struct snd_seq_client *client,
struct snd_seq_client_port *port,
struct snd_seq_subscribers *subs,
bool is_src, bool ack)
+{
- struct snd_seq_port_subs_info *grp;
- grp = is_src ? &port->c_src : &port->c_dest;
- down_write(&grp->list_mutex);
- __delete_and_unsubscribe_port(client, port, subs, is_src, ack); up_write(&grp->list_mutex);
}
@@ -590,27 +602,30 @@ int snd_seq_port_disconnect(struct snd_seq_client *connector, struct snd_seq_client_port *dest_port, struct snd_seq_port_subscribe *info) {
- struct snd_seq_port_subs_info *src = &src_port->c_src;
- struct snd_seq_port_subs_info *dest = &dest_port->c_dest; struct snd_seq_subscribers *subs; int err = -ENOENT;
- down_write(&src->list_mutex);
- /* always start from deleting the dest port for avoiding concurrent
* deletions
*/
- down_write(&dest->list_mutex); /* look for the connection */
- list_for_each_entry(subs, &src->list_head, src_list) {
- list_for_each_entry(subs, &dest->list_head, dest_list) { if (match_subs_info(info, &subs->info)) {
atomic_dec(&subs->ref_count); /* mark as not ready */
__delete_and_unsubscribe_port(dest_client, dest_port,
subs, false,
} }connector->number != dest_client->number); err = 0; break;
- up_write(&src->list_mutex);
up_write(&dest->list_mutex); if (err < 0) return err;
delete_and_unsubscribe_port(src_client, src_port, subs, true, connector->number != src_client->number);
- delete_and_unsubscribe_port(dest_client, dest_port, subs, false,
kfree(subs); return 0;connector->number != dest_client->number);
}
2.26.2
Folkert van Heusden
-- MultiTail is a versatile tool for watching logfiles and output of commands. Filtering, coloring, merging, diff-view, etc. http://www.vanheusden.com/multitail/
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
To which kernel version should I apply it? Because:
folkert@oensoens:~/linux-5.11.0$ patch -p1 < ~/alsa-patch.diff patching file sound/core/seq/seq_ports.c patching file sound/core/seq/seq_ports.h patching file sound/core/seq/seq_ports.c Hunk #1 succeeded at 526 (offset 12 lines). Hunk #2 succeeded at 538 (offset 12 lines). Hunk #3 succeeded at 547 with fuzz 2 (offset 12 lines). Hunk #4 FAILED at 602. 1 out of 4 hunks FAILED -- saving rejects to file sound/core/seq/seq_ports.c.rej
Did you scratch the previous patch? AFAIK, the latest patch should be applicable to 5.11 and older cleanly.
I did. It also fails on 5.13.7.
On Tue, 03 Aug 2021 09:44:51 +0200, folkert wrote:
To which kernel version should I apply it? Because:
folkert@oensoens:~/linux-5.11.0$ patch -p1 < ~/alsa-patch.diff patching file sound/core/seq/seq_ports.c patching file sound/core/seq/seq_ports.h patching file sound/core/seq/seq_ports.c Hunk #1 succeeded at 526 (offset 12 lines). Hunk #2 succeeded at 538 (offset 12 lines). Hunk #3 succeeded at 547 with fuzz 2 (offset 12 lines). Hunk #4 FAILED at 602. 1 out of 4 hunks FAILED -- saving rejects to file sound/core/seq/seq_ports.c.rej
Did you scratch the previous patch? AFAIK, the latest patch should be applicable to 5.11 and older cleanly.
I did. It also fails on 5.13.7.
Weird, here it's applied cleanly on my machine with every kernel version. Double-check whether you really have any other changes. I attach the patch again but with an attachment just to be sure.
Takashi
I did. It also fails on 5.13.7.
Weird, here it's applied cleanly on my machine with every kernel version. Double-check whether you really have any other changes. I attach the patch again but with an attachment just to be sure.
It now applies fine! Strange. Most likely I made a mistake before. Anyway, it is now compiling.
folkert@oensoens:~$ aplaymidi -l ALSA lib seq_hw.c:466:(snd_seq_hw_open) open /dev/snd/seq failed: Permission denied Cannot open sequencer - Permission denied
???
Hrm, that's unexpected.
New system, account was not in the audio-group.
Meanwhile, I reconsidered the fix and a better idea came after the sleep. Below is the new fix patch. Could you give it a try instead of the previous one?
Works fine as well: 7 concurrent script-instances for half an hour and nothing logged in dmesg.
On Tue, 03 Aug 2021 12:12:34 +0200, folkert wrote:
folkert@oensoens:~$ aplaymidi -l ALSA lib seq_hw.c:466:(snd_seq_hw_open) open /dev/snd/seq failed: Permission denied Cannot open sequencer - Permission denied
???
Hrm, that's unexpected.
New system, account was not in the audio-group.
Meanwhile, I reconsidered the fix and a better idea came after the sleep. Below is the new fix patch. Could you give it a try instead of the previous one?
Works fine as well: 7 concurrent script-instances for half an hour and nothing logged in dmesg.
Great, I'm going to submit and merge the fix soon.
Thanks!
Takashi
participants (2)
-
folkert
-
Takashi Iwai