snd-cmipci oops during probe on arm64 (current mainline, pre-6.6-rc1)
Hi,
I'm using an arm64 workstation, and wanted to add a sound card to it. I bought one who was pretty popular around where I live, and it is supported by the snd-cmipci driver.
It's this one:
0005:02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
After building a mailine kernel (post-v6.5, pre-rc1) on Debian testing arm64 with localmodconfig + CONFIG_SND_CMIPCI=m, it crashes with "Unable to handle kernel paging request at virtual address fffffbfffe80000c", and the system never finishes to boot. The login manager never shows up and the serial console never gets to a login prompt. I observed the same issue on a 6.3 Debian kernel, after rebuilding with CONFIG_SND_CMIPCI=m.
If I stop the module from being automatically loaded by adding `blacklist snd-cmipci` to /etc/modprobe.d/snd-cmipci.conf (or if I remove the card from the PCIe slot), I get the system to boot. But tring to load the module manually causes the same crash (I only tested this with the card on):
[ +4,501093] snd_cmipci 0005:02:00.0: stream 512 already in tree [ +0,000155] Unable to handle kernel paging request at virtual address fffffbfffe80000c [ +0,007927] Mem abort info: [ +0,002793] ESR = 0x0000000096000006 [ +0,003743] EC = 0x25: DABT (current EL), IL = 32 bits [ +0,005307] SET = 0, FnV = 0 [ +0,003049] EA = 0, S1PTW = 0 [ +0,003134] FSC = 0x06: level 2 translation fault [ +0,004872] Data abort info: [ +0,002873] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ +0,005479] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ +0,005047] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ +0,000003] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080519fe9000 [ +0,000004] [fffffbfffe80000c] pgd=000008051a979003, p4d=000008051a979003, pud=000008051a97a003, pmd=0000000000000000 [ +0,000009] Internal error: Oops: 0000000096000006 [#1] SMP [ +0,028142] Modules linked in: snd_cmipci(+) snd_mpu401_uart snd_opl3_lib xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_seq_dummy snd_hrtimer snd_seq qrtr rfkill overlay ftdi_sio usbserial snd_usb_audio snd_usbmidi_lib snd_pcm aes_ce_blk aes_ce_cipher snd_hwdep polyval_ce snd_rawmidi polyval_generic snd_seq_device joydev snd_timer ghash_ce hid_generic gf128mul snd usbhid sha2_ce ipmi_ssif soundcore hid mc sha256_arm64 ipmi_devintf arm_spe_pmu ipmi_msghandler sha1_ce sbsa_gwdt binfmt_misc nls_ascii nls_cp437 vfat fat xgene_hwmon cppc_cpufreq arm_cmn arm_dsu_pmu evdev nfsd auth_rpcgss nfs_acl lockd grace dm_mod fuse loop efi_pstore dax sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid456 async_raid6_recov async_memcpy [ +0,000142] async_pq async_xor async_tx libcrc32c crc32c_generic xor xor_neon raid6_pq raid1 raid0 multipath linear md_mod nvme nvme_core ast t10_pi drm_shmem_helper xhci_pci drm_kms_helper xhci_hcd crc64_rocksoft crc64 drm crc_t10dif usbcore crct10dif_generic igb crct10dif_ce crct10dif_common usb_common i2c_algo_bit i2c_designware_platform i2c_designware_core [ +0,121670] CPU: 0 PID: 442 Comm: kworker/0:4 Not tainted 6.5.0+ #2 [ +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022 [ +0,012506] Workqueue: events work_for_cpu_fn [ +0,004353] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ +0,006953] pc : logic_inl+0xa0/0xd8 [ +0,003570] lr : snd_cmipci_probe+0x7a4/0x1140 [snd_cmipci] [ +0,005578] sp : ffff80008287bc70 [ +0,003303] x29: ffff80008287bc70 x28: ffff08008af9d6a0 x27: 0000000000000000 [ +0,007128] x26: ffffc4818263c228 x25: 0000000000000000 x24: 0000000000000001 [ +0,007127] x23: ffff07ff81a9e000 x22: ffff07ff81a9e0c0 x21: ffff08008af9d080 [ +0,007127] x20: ffffc4818263c000 x19: 0000000000000000 x18: ffffffffffffffff [ +0,007127] x17: 0000000000000000 x16: ffffc4819ac3cd38 x15: ffff80008287ba80 [ +0,007127] x14: 0000000000000001 x13: ffff80008287bbc4 x12: 0000000000000000 [ +0,007126] x11: ffff07ff834616d0 x10: ffffffffffffffc0 x9 : ffffc4819a61dd18 [ +0,007127] x8 : 0000000000000228 x7 : 0000000000000001 x6 : 00000000000000ff [ +0,007127] x5 : ffffc4819adb7998 x4 : 0000000000000000 x3 : 00000000000000ff [ +0,007127] x2 : 0000000000ffbffe x1 : 000000000000000c x0 : fffffbfffe80000c [ +0,007126] Call trace: [ +0,002436] logic_inl+0xa0/0xd8 [ +0,003221] local_pci_probe+0x48/0xb8 [ +0,003744] work_for_cpu_fn+0x24/0x40 [ +0,003741] process_one_work+0x170/0x3a8 [ +0,004002] worker_thread+0x23c/0x460 [ +0,003742] kthread+0xe8/0xf8 [ +0,003047] ret_from_fork+0x10/0x20 [ +0,003569] Code: d2bfd000 f2df7fe0 f2ffffe0 8b000020 (b9400000) [ +0,006083] ---[ end trace 0000000000000000 ]---
Because this sound card chipset seems to be popular (pretty much all PCI cards I can find to buy locally use that), I'm thinking this might be specific to arm64, otherwise someone would have seen this before.
On Wed, 06 Sep 2023 00:01:01 +0200, Antonio Terceiro wrote:
Hi,
I'm using an arm64 workstation, and wanted to add a sound card to it. I bought one who was pretty popular around where I live, and it is supported by the snd-cmipci driver.
It's this one:
0005:02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
After building a mailine kernel (post-v6.5, pre-rc1) on Debian testing arm64 with localmodconfig + CONFIG_SND_CMIPCI=m, it crashes with "Unable to handle kernel paging request at virtual address fffffbfffe80000c", and the system never finishes to boot. The login manager never shows up and the serial console never gets to a login prompt. I observed the same issue on a 6.3 Debian kernel, after rebuilding with CONFIG_SND_CMIPCI=m.
If I stop the module from being automatically loaded by adding `blacklist snd-cmipci` to /etc/modprobe.d/snd-cmipci.conf (or if I remove the card from the PCIe slot), I get the system to boot. But tring to load the module manually causes the same crash (I only tested this with the card on):
[ +4,501093] snd_cmipci 0005:02:00.0: stream 512 already in tree [ +0,000155] Unable to handle kernel paging request at virtual address fffffbfffe80000c [ +0,007927] Mem abort info: [ +0,002793] ESR = 0x0000000096000006 [ +0,003743] EC = 0x25: DABT (current EL), IL = 32 bits [ +0,005307] SET = 0, FnV = 0 [ +0,003049] EA = 0, S1PTW = 0 [ +0,003134] FSC = 0x06: level 2 translation fault [ +0,004872] Data abort info: [ +0,002873] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ +0,005479] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ +0,005047] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ +0,000003] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080519fe9000 [ +0,000004] [fffffbfffe80000c] pgd=000008051a979003, p4d=000008051a979003, pud=000008051a97a003, pmd=0000000000000000 [ +0,000009] Internal error: Oops: 0000000096000006 [#1] SMP [ +0,028142] Modules linked in: snd_cmipci(+) snd_mpu401_uart snd_opl3_lib xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_seq_dummy snd_hrtimer snd_seq qrtr rfkill overlay ftdi_sio usbserial snd_usb_audio snd_usbmidi_lib snd_pcm aes_ce_blk aes_ce_cipher snd_hwdep polyval_ce snd_rawmidi polyval_generic snd_seq_device joydev snd_timer ghash_ce hid_generic gf128mul snd usbhid sha2_ce ipmi_ssif soundcore hid mc sha256_arm64 ipmi_devintf arm_spe_pmu ipmi_msghandler sha1_ce sbsa_gwdt binfmt_misc nls_ascii nls_cp437 vfat fat xgene_hwmon cppc_cpufreq arm_cmn arm_dsu_pmu evdev nfsd auth_rpcgss nfs_acl lockd grace dm_mod fuse loop efi_pstore dax sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid
456 async_raid6_recov async_memcpy
[ +0,000142] async_pq async_xor async_tx libcrc32c crc32c_generic xor xor_neon raid6_pq raid1 raid0 multipath linear md_mod nvme nvme_core ast t10_pi drm_shmem_helper xhci_pci drm_kms_helper xhci_hcd crc64_rocksoft crc64 drm crc_t10dif usbcore crct10dif_generic igb crct10dif_ce crct10dif_common usb_common i2c_algo_bit i2c_designware_platform i2c_designware_core [ +0,121670] CPU: 0 PID: 442 Comm: kworker/0:4 Not tainted 6.5.0+ #2 [ +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022 [ +0,012506] Workqueue: events work_for_cpu_fn [ +0,004353] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ +0,006953] pc : logic_inl+0xa0/0xd8 [ +0,003570] lr : snd_cmipci_probe+0x7a4/0x1140 [snd_cmipci] [ +0,005578] sp : ffff80008287bc70 [ +0,003303] x29: ffff80008287bc70 x28: ffff08008af9d6a0 x27: 0000000000000000 [ +0,007128] x26: ffffc4818263c228 x25: 0000000000000000 x24: 0000000000000001 [ +0,007127] x23: ffff07ff81a9e000 x22: ffff07ff81a9e0c0 x21: ffff08008af9d080 [ +0,007127] x20: ffffc4818263c000 x19: 0000000000000000 x18: ffffffffffffffff [ +0,007127] x17: 0000000000000000 x16: ffffc4819ac3cd38 x15: ffff80008287ba80 [ +0,007127] x14: 0000000000000001 x13: ffff80008287bbc4 x12: 0000000000000000 [ +0,007126] x11: ffff07ff834616d0 x10: ffffffffffffffc0 x9 : ffffc4819a61dd18 [ +0,007127] x8 : 0000000000000228 x7 : 0000000000000001 x6 : 00000000000000ff [ +0,007127] x5 : ffffc4819adb7998 x4 : 0000000000000000 x3 : 00000000000000ff [ +0,007127] x2 : 0000000000ffbffe x1 : 000000000000000c x0 : fffffbfffe80000c [ +0,007126] Call trace: [ +0,002436] logic_inl+0xa0/0xd8 [ +0,003221] local_pci_probe+0x48/0xb8 [ +0,003744] work_for_cpu_fn+0x24/0x40 [ +0,003741] process_one_work+0x170/0x3a8 [ +0,004002] worker_thread+0x23c/0x460 [ +0,003742] kthread+0xe8/0xf8 [ +0,003047] ret_from_fork+0x10/0x20 [ +0,003569] Code: d2bfd000 f2df7fe0 f2ffffe0 8b000020 (b9400000) [ +0,006083] ---[ end trace 0000000000000000 ]---
Because this sound card chipset seems to be popular (pretty much all PCI cards I can find to buy locally use that), I'm thinking this might be specific to arm64, otherwise someone would have seen this before.
There is only one change in this driver code itself since 6.5 (commit b6ba0aa46138), and judging from the stack trace, it's unrelated with your problem. It's more likely a regression in the lower level code, e.g. PCI layer or arch/arm64 stuff.
Could you try git bisect?
thanks,
Takashi
On 2023-09-06 07:10, Takashi Iwai wrote:
On Wed, 06 Sep 2023 00:01:01 +0200, Antonio Terceiro wrote:
Hi,
I'm using an arm64 workstation, and wanted to add a sound card to it. I bought one who was pretty popular around where I live, and it is supported by the snd-cmipci driver.
It's this one:
0005:02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
After building a mailine kernel (post-v6.5, pre-rc1) on Debian testing arm64 with localmodconfig + CONFIG_SND_CMIPCI=m, it crashes with "Unable to handle kernel paging request at virtual address fffffbfffe80000c", and the system never finishes to boot. The login manager never shows up and the serial console never gets to a login prompt. I observed the same issue on a 6.3 Debian kernel, after rebuilding with CONFIG_SND_CMIPCI=m.
If I stop the module from being automatically loaded by adding `blacklist snd-cmipci` to /etc/modprobe.d/snd-cmipci.conf (or if I remove the card from the PCIe slot), I get the system to boot. But tring to load the module manually causes the same crash (I only tested this with the card on):
[ +4,501093] snd_cmipci 0005:02:00.0: stream 512 already in tree [ +0,000155] Unable to handle kernel paging request at virtual address fffffbfffe80000c [ +0,007927] Mem abort info: [ +0,002793] ESR = 0x0000000096000006 [ +0,003743] EC = 0x25: DABT (current EL), IL = 32 bits [ +0,005307] SET = 0, FnV = 0 [ +0,003049] EA = 0, S1PTW = 0 [ +0,003134] FSC = 0x06: level 2 translation fault [ +0,004872] Data abort info: [ +0,002873] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ +0,005479] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ +0,005047] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ +0,000003] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080519fe9000 [ +0,000004] [fffffbfffe80000c] pgd=000008051a979003, p4d=000008051a979003, pud=000008051a97a003, pmd=0000000000000000 [ +0,000009] Internal error: Oops: 0000000096000006 [#1] SMP [ +0,028142] Modules linked in: snd_cmipci(+) snd_mpu401_uart snd_opl3_lib xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_seq_dummy snd_hrtimer snd_seq qrtr rfkill overlay ftdi_sio usbserial snd_usb_audio snd_usbmidi_lib snd_pcm aes_ce_blk aes_ce_cipher snd_hwdep polyval_ce snd_rawmidi polyval_generic snd_seq_device joydev snd_timer ghash_ce hid_generic gf128mul snd usbhid sha2_ce ipmi_ssif soundcore hid mc sha256_arm64 ipmi_devintf arm_spe_pmu ipmi_msghandler sha1_ce sbsa_gwdt binfmt_misc nls_ascii nls_cp437 vfat fat xgene_hwmon cppc_cpufreq arm_cmn arm_dsu_pmu evdev nfsd auth_rpcgss nfs_acl lockd grace dm_mod fuse loop efi_pstore dax sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid
456 async_raid6_recov async_memcpy
[ +0,000142] async_pq async_xor async_tx libcrc32c crc32c_generic xor xor_neon raid6_pq raid1 raid0 multipath linear md_mod nvme nvme_core ast t10_pi drm_shmem_helper xhci_pci drm_kms_helper xhci_hcd crc64_rocksoft crc64 drm crc_t10dif usbcore crct10dif_generic igb crct10dif_ce crct10dif_common usb_common i2c_algo_bit i2c_designware_platform i2c_designware_core [ +0,121670] CPU: 0 PID: 442 Comm: kworker/0:4 Not tainted 6.5.0+ #2 [ +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022 [ +0,012506] Workqueue: events work_for_cpu_fn [ +0,004353] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ +0,006953] pc : logic_inl+0xa0/0xd8 [ +0,003570] lr : snd_cmipci_probe+0x7a4/0x1140 [snd_cmipci] [ +0,005578] sp : ffff80008287bc70 [ +0,003303] x29: ffff80008287bc70 x28: ffff08008af9d6a0 x27: 0000000000000000 [ +0,007128] x26: ffffc4818263c228 x25: 0000000000000000 x24: 0000000000000001 [ +0,007127] x23: ffff07ff81a9e000 x22: ffff07ff81a9e0c0 x21: ffff08008af9d080 [ +0,007127] x20: ffffc4818263c000 x19: 0000000000000000 x18: ffffffffffffffff [ +0,007127] x17: 0000000000000000 x16: ffffc4819ac3cd38 x15: ffff80008287ba80 [ +0,007127] x14: 0000000000000001 x13: ffff80008287bbc4 x12: 0000000000000000 [ +0,007126] x11: ffff07ff834616d0 x10: ffffffffffffffc0 x9 : ffffc4819a61dd18 [ +0,007127] x8 : 0000000000000228 x7 : 0000000000000001 x6 : 00000000000000ff [ +0,007127] x5 : ffffc4819adb7998 x4 : 0000000000000000 x3 : 00000000000000ff [ +0,007127] x2 : 0000000000ffbffe x1 : 000000000000000c x0 : fffffbfffe80000c [ +0,007126] Call trace: [ +0,002436] logic_inl+0xa0/0xd8 [ +0,003221] local_pci_probe+0x48/0xb8 [ +0,003744] work_for_cpu_fn+0x24/0x40 [ +0,003741] process_one_work+0x170/0x3a8 [ +0,004002] worker_thread+0x23c/0x460 [ +0,003742] kthread+0xe8/0xf8 [ +0,003047] ret_from_fork+0x10/0x20 [ +0,003569] Code: d2bfd000 f2df7fe0 f2ffffe0 8b000020 (b9400000) [ +0,006083] ---[ end trace 0000000000000000 ]---
Because this sound card chipset seems to be popular (pretty much all PCI cards I can find to buy locally use that), I'm thinking this might be specific to arm64, otherwise someone would have seen this before.
There is only one change in this driver code itself since 6.5 (commit b6ba0aa46138), and judging from the stack trace, it's unrelated with your problem. It's more likely a regression in the lower level code, e.g. PCI layer or arch/arm64 stuff.
Could you try git bisect?
Hmm, but has this combination of card and machine *ever* actually worked?
It's blowing up trying to access PCI I/O space, which has apparently ended up in the indirect access mechanism without that being configured correctly. That is definitely an issue down somewhere between the PCI layer and the system firmware. Does the system even have an I/O space window? Some arm64 machines don't. I guess we might not have got as far as probing a driver if the I/O BAR couldn't be assigned at all, but either way something's not gone right.
Thanks, Robin.
On Wed, Sep 06, 2023 at 01:49:16PM +0100, Robin Murphy wrote:
On 2023-09-06 07:10, Takashi Iwai wrote:
On Wed, 06 Sep 2023 00:01:01 +0200, Antonio Terceiro wrote:
Hi,
I'm using an arm64 workstation, and wanted to add a sound card to it. I bought one who was pretty popular around where I live, and it is supported by the snd-cmipci driver.
It's this one:
0005:02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
After building a mailine kernel (post-v6.5, pre-rc1) on Debian testing arm64 with localmodconfig + CONFIG_SND_CMIPCI=m, it crashes with "Unable to handle kernel paging request at virtual address fffffbfffe80000c", and the system never finishes to boot. The login manager never shows up and the serial console never gets to a login prompt. I observed the same issue on a 6.3 Debian kernel, after rebuilding with CONFIG_SND_CMIPCI=m.
If I stop the module from being automatically loaded by adding `blacklist snd-cmipci` to /etc/modprobe.d/snd-cmipci.conf (or if I remove the card from the PCIe slot), I get the system to boot. But tring to load the module manually causes the same crash (I only tested this with the card on):
[ +4,501093] snd_cmipci 0005:02:00.0: stream 512 already in tree [ +0,000155] Unable to handle kernel paging request at virtual address fffffbfffe80000c [ +0,007927] Mem abort info: [ +0,002793] ESR = 0x0000000096000006 [ +0,003743] EC = 0x25: DABT (current EL), IL = 32 bits [ +0,005307] SET = 0, FnV = 0 [ +0,003049] EA = 0, S1PTW = 0 [ +0,003134] FSC = 0x06: level 2 translation fault [ +0,004872] Data abort info: [ +0,002873] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ +0,005479] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ +0,005047] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ +0,000003] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080519fe9000 [ +0,000004] [fffffbfffe80000c] pgd=000008051a979003, p4d=000008051a979003, pud=000008051a97a003, pmd=0000000000000000 [ +0,000009] Internal error: Oops: 0000000096000006 [#1] SMP [ +0,028142] Modules linked in: snd_cmipci(+) snd_mpu401_uart snd_opl3_lib xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_seq_dummy snd_hrtimer snd_seq qrtr rfkill overlay ftdi_sio usbserial snd_usb_audio snd_usbmidi_lib snd_pcm aes_ce_blk aes_ce_cipher snd_hwdep polyval_ce snd_rawmidi polyval_generic snd_seq_device joydev snd_timer ghash_ce hid_generic gf128mul snd usbhid sha2_ce ipmi_ssif soundcore hid mc sha256_arm64 ipmi_devintf arm_spe_pmu ipmi_msghandler sha1_ce sbsa_gwdt binfmt_misc nls_ascii nls_cp437 vfat fat xgene_hwmon cppc_cpufreq arm_cmn arm_dsu_pmu evdev nfsd auth_rpcgss nfs_acl lockd grace dm_mod fuse loop efi_pstore dax sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid
456 async_raid6_recov async_memcpy
[ +0,000142] async_pq async_xor async_tx libcrc32c crc32c_generic xor xor_neon raid6_pq raid1 raid0 multipath linear md_mod nvme nvme_core ast t10_pi drm_shmem_helper xhci_pci drm_kms_helper xhci_hcd crc64_rocksoft crc64 drm crc_t10dif usbcore crct10dif_generic igb crct10dif_ce crct10dif_common usb_common i2c_algo_bit i2c_designware_platform i2c_designware_core [ +0,121670] CPU: 0 PID: 442 Comm: kworker/0:4 Not tainted 6.5.0+ #2 [ +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022 [ +0,012506] Workqueue: events work_for_cpu_fn [ +0,004353] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ +0,006953] pc : logic_inl+0xa0/0xd8 [ +0,003570] lr : snd_cmipci_probe+0x7a4/0x1140 [snd_cmipci] [ +0,005578] sp : ffff80008287bc70 [ +0,003303] x29: ffff80008287bc70 x28: ffff08008af9d6a0 x27: 0000000000000000 [ +0,007128] x26: ffffc4818263c228 x25: 0000000000000000 x24: 0000000000000001 [ +0,007127] x23: ffff07ff81a9e000 x22: ffff07ff81a9e0c0 x21: ffff08008af9d080 [ +0,007127] x20: ffffc4818263c000 x19: 0000000000000000 x18: ffffffffffffffff [ +0,007127] x17: 0000000000000000 x16: ffffc4819ac3cd38 x15: ffff80008287ba80 [ +0,007127] x14: 0000000000000001 x13: ffff80008287bbc4 x12: 0000000000000000 [ +0,007126] x11: ffff07ff834616d0 x10: ffffffffffffffc0 x9 : ffffc4819a61dd18 [ +0,007127] x8 : 0000000000000228 x7 : 0000000000000001 x6 : 00000000000000ff [ +0,007127] x5 : ffffc4819adb7998 x4 : 0000000000000000 x3 : 00000000000000ff [ +0,007127] x2 : 0000000000ffbffe x1 : 000000000000000c x0 : fffffbfffe80000c [ +0,007126] Call trace: [ +0,002436] logic_inl+0xa0/0xd8 [ +0,003221] local_pci_probe+0x48/0xb8 [ +0,003744] work_for_cpu_fn+0x24/0x40 [ +0,003741] process_one_work+0x170/0x3a8 [ +0,004002] worker_thread+0x23c/0x460 [ +0,003742] kthread+0xe8/0xf8 [ +0,003047] ret_from_fork+0x10/0x20 [ +0,003569] Code: d2bfd000 f2df7fe0 f2ffffe0 8b000020 (b9400000) [ +0,006083] ---[ end trace 0000000000000000 ]---
Because this sound card chipset seems to be popular (pretty much all PCI cards I can find to buy locally use that), I'm thinking this might be specific to arm64, otherwise someone would have seen this before.
There is only one change in this driver code itself since 6.5 (commit b6ba0aa46138), and judging from the stack trace, it's unrelated with your problem. It's more likely a regression in the lower level code, e.g. PCI layer or arch/arm64 stuff.
Could you try git bisect?
Hmm, but has this combination of card and machine *ever* actually worked?
That could be it. In trying to find a starting point for the bisection, I tried 6.1.0, 5.15.130, and 5.10.19, and they all fail in exactly the same way. I didn't go further back.
It's blowing up trying to access PCI I/O space, which has apparently ended up in the indirect access mechanism without that being configured correctly. That is definitely an issue down somewhere between the PCI layer and the system firmware. Does the system even have an I/O space window? Some arm64 machines don't. I guess we might not have got as far as probing a driver if the I/O BAR couldn't be assigned at all, but either way something's not gone right.
I'm pretty sure I saw reports of people using PCI GPUs on this machine, but I would need to confirm.
What info would I need to gather from the machine in order to figure this out?
On Wed, Sep 06, 2023 at 03:36:40PM -0300, Antonio Terceiro wrote:
On Wed, Sep 06, 2023 at 01:49:16PM +0100, Robin Murphy wrote:
On 2023-09-06 07:10, Takashi Iwai wrote:
On Wed, 06 Sep 2023 00:01:01 +0200, Antonio Terceiro wrote:
Hi,
Hi Antonio, my 2 cents:
I'm using an arm64 workstation, and wanted to add a sound card to it. I bought one who was pretty popular around where I live, and it is supported by the snd-cmipci driver.
Specifically, which arm64 workstation? I'm guessing Compute Module 4 IO Board + Raspbery Pi CM4? This detail is important because the stack trace you provided only references generic PCI calls and there's a need to know exactly which PCIe driver could be failing. Is it pcie-brcmstb?
Thanks, Geraldo Nascimento
On 2023-09-06 20:03, Geraldo Nascimento wrote:
On Wed, Sep 06, 2023 at 03:36:40PM -0300, Antonio Terceiro wrote:
On Wed, Sep 06, 2023 at 01:49:16PM +0100, Robin Murphy wrote:
On 2023-09-06 07:10, Takashi Iwai wrote:
On Wed, 06 Sep 2023 00:01:01 +0200, Antonio Terceiro wrote:
Hi,
Hi Antonio, my 2 cents:
I'm using an arm64 workstation, and wanted to add a sound card to it. I bought one who was pretty popular around where I live, and it is supported by the snd-cmipci driver.
Specifically, which arm64 workstation? I'm guessing Compute Module 4 IO Board + Raspbery Pi CM4? This detail is important because the stack trace you provided only references generic PCI calls and there's a need to know exactly which PCIe driver could be failing. Is it pcie-brcmstb?
Bit bigger than a Pi... ;)
[ +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA
Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
They look like pretty nice boxes - https://www.ipi.wiki/pages/com-hpc-altra
Robin.
On Wed, Sep 06, 2023 at 09:37:18PM +0100, Robin Murphy wrote:
Bit bigger than a Pi... ;)
Ohh, that's impressive indeed!
But looking around with Google, it turns out the Altra Ampere PCIe is definitely quirky, see:
https://lore.kernel.org/linux-acpi/20200806225525.GA706347@bjorn-Precision-5... https://github.com/Tencent/TencentOS-kernel/commit/f454797b673c06c0eb1b77be2...
The first quirk should probably be activated on Antonio's kernel but the second one being a downstream Tencent patch, isn't. Alas, the second quirk comes with a performance hit, see:
https://gitlab.freedesktop.org/drm/amd/-/issues/2078
Thanks, Geraldo Nascimento
On 2023-09-06 19:36, Antonio Terceiro wrote:
On Wed, Sep 06, 2023 at 01:49:16PM +0100, Robin Murphy wrote:
On 2023-09-06 07:10, Takashi Iwai wrote:
On Wed, 06 Sep 2023 00:01:01 +0200, Antonio Terceiro wrote:
Hi,
I'm using an arm64 workstation, and wanted to add a sound card to it. I bought one who was pretty popular around where I live, and it is supported by the snd-cmipci driver.
It's this one:
0005:02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
After building a mailine kernel (post-v6.5, pre-rc1) on Debian testing arm64 with localmodconfig + CONFIG_SND_CMIPCI=m, it crashes with "Unable to handle kernel paging request at virtual address fffffbfffe80000c", and the system never finishes to boot. The login manager never shows up and the serial console never gets to a login prompt. I observed the same issue on a 6.3 Debian kernel, after rebuilding with CONFIG_SND_CMIPCI=m.
If I stop the module from being automatically loaded by adding `blacklist snd-cmipci` to /etc/modprobe.d/snd-cmipci.conf (or if I remove the card from the PCIe slot), I get the system to boot. But tring to load the module manually causes the same crash (I only tested this with the card on):
[ +4,501093] snd_cmipci 0005:02:00.0: stream 512 already in tree [ +0,000155] Unable to handle kernel paging request at virtual address fffffbfffe80000c [ +0,007927] Mem abort info: [ +0,002793] ESR = 0x0000000096000006 [ +0,003743] EC = 0x25: DABT (current EL), IL = 32 bits [ +0,005307] SET = 0, FnV = 0 [ +0,003049] EA = 0, S1PTW = 0 [ +0,003134] FSC = 0x06: level 2 translation fault [ +0,004872] Data abort info: [ +0,002873] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ +0,005479] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ +0,005047] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ +0,000003] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080519fe9000 [ +0,000004] [fffffbfffe80000c] pgd=000008051a979003, p4d=000008051a979003, pud=000008051a97a003, pmd=0000000000000000 [ +0,000009] Internal error: Oops: 0000000096000006 [#1] SMP [ +0,028142] Modules linked in: snd_cmipci(+) snd_mpu401_uart snd_opl3_lib xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_seq_dummy snd_hrtimer snd_seq qrtr rfkill overlay ftdi_sio usbserial snd_usb_audio snd_usbmidi_lib snd_pcm aes_ce_blk aes_ce_cipher snd_hwdep polyval_ce snd_rawmidi polyval_generic snd_seq_device joydev snd_timer ghash_ce hid_generic gf128mul snd usbhid sha2_ce ipmi_ssif soundcore hid mc sha256_arm64 ipmi_devintf arm_spe_pmu ipmi_msghandler sha1_ce sbsa_gwdt binfmt_misc nls_ascii nls_cp437 vfat fat xgene_hwmon cppc_cpufreq arm_cmn arm_dsu_pmu evdev nfsd auth_rpcgss nfs_acl lockd grace dm_mod fuse loop efi_pstore dax sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid
456 async_raid6_recov async_memcpy
[ +0,000142] async_pq async_xor async_tx libcrc32c crc32c_generic xor xor_neon raid6_pq raid1 raid0 multipath linear md_mod nvme nvme_core ast t10_pi drm_shmem_helper xhci_pci drm_kms_helper xhci_hcd crc64_rocksoft crc64 drm crc_t10dif usbcore crct10dif_generic igb crct10dif_ce crct10dif_common usb_common i2c_algo_bit i2c_designware_platform i2c_designware_core [ +0,121670] CPU: 0 PID: 442 Comm: kworker/0:4 Not tainted 6.5.0+ #2 [ +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022 [ +0,012506] Workqueue: events work_for_cpu_fn [ +0,004353] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ +0,006953] pc : logic_inl+0xa0/0xd8 [ +0,003570] lr : snd_cmipci_probe+0x7a4/0x1140 [snd_cmipci] [ +0,005578] sp : ffff80008287bc70 [ +0,003303] x29: ffff80008287bc70 x28: ffff08008af9d6a0 x27: 0000000000000000 [ +0,007128] x26: ffffc4818263c228 x25: 0000000000000000 x24: 0000000000000001 [ +0,007127] x23: ffff07ff81a9e000 x22: ffff07ff81a9e0c0 x21: ffff08008af9d080 [ +0,007127] x20: ffffc4818263c000 x19: 0000000000000000 x18: ffffffffffffffff [ +0,007127] x17: 0000000000000000 x16: ffffc4819ac3cd38 x15: ffff80008287ba80 [ +0,007127] x14: 0000000000000001 x13: ffff80008287bbc4 x12: 0000000000000000 [ +0,007126] x11: ffff07ff834616d0 x10: ffffffffffffffc0 x9 : ffffc4819a61dd18 [ +0,007127] x8 : 0000000000000228 x7 : 0000000000000001 x6 : 00000000000000ff [ +0,007127] x5 : ffffc4819adb7998 x4 : 0000000000000000 x3 : 00000000000000ff [ +0,007127] x2 : 0000000000ffbffe x1 : 000000000000000c x0 : fffffbfffe80000c [ +0,007126] Call trace: [ +0,002436] logic_inl+0xa0/0xd8 [ +0,003221] local_pci_probe+0x48/0xb8 [ +0,003744] work_for_cpu_fn+0x24/0x40 [ +0,003741] process_one_work+0x170/0x3a8 [ +0,004002] worker_thread+0x23c/0x460 [ +0,003742] kthread+0xe8/0xf8 [ +0,003047] ret_from_fork+0x10/0x20 [ +0,003569] Code: d2bfd000 f2df7fe0 f2ffffe0 8b000020 (b9400000) [ +0,006083] ---[ end trace 0000000000000000 ]---
Because this sound card chipset seems to be popular (pretty much all PCI cards I can find to buy locally use that), I'm thinking this might be specific to arm64, otherwise someone would have seen this before.
There is only one change in this driver code itself since 6.5 (commit b6ba0aa46138), and judging from the stack trace, it's unrelated with your problem. It's more likely a regression in the lower level code, e.g. PCI layer or arch/arm64 stuff.
Could you try git bisect?
Hmm, but has this combination of card and machine *ever* actually worked?
That could be it. In trying to find a starting point for the bisection, I tried 6.1.0, 5.15.130, and 5.10.19, and they all fail in exactly the same way. I didn't go further back.
It's blowing up trying to access PCI I/O space, which has apparently ended up in the indirect access mechanism without that being configured correctly. That is definitely an issue down somewhere between the PCI layer and the system firmware. Does the system even have an I/O space window? Some arm64 machines don't. I guess we might not have got as far as probing a driver if the I/O BAR couldn't be assigned at all, but either way something's not gone right.
I'm pretty sure I saw reports of people using PCI GPUs on this machine, but I would need to confirm.
GPUs and any other PCIe devices will be fine, since they will use memory BARs - I/O space is pretty much deprecated in PCIe, and as mentioned some systems don't even support it at all. I found a datasheet for CMI8738, and they seem to be right at the other end of the scale as legacy PCI chips with *only* an I/O BAR (and so I guess your card includes a PCIe-PCI bridge as well), so are definitely going to be hitting paths that are less well-exercised on arm64 in general.
What info would I need to gather from the machine in order to figure this out?
The first thing I'd try is rebuilding the kernel with CONFIG_INDIRECT_PIO disabled and see what difference that makes. I'm not too familiar with that area of the code, so the finer details of how to debug broken I/O space beyond that would be more of a linux-pci question.
Thanks, Robin.
On Wed, Sep 06, 2023 at 08:52:40PM +0100, Robin Murphy wrote:
On 2023-09-06 19:36, Antonio Terceiro wrote:
I'm pretty sure I saw reports of people using PCI GPUs on this machine, but I would need to confirm.
GPUs and any other PCIe devices will be fine, since they will use memory BARs - I/O space is pretty much deprecated in PCIe, and as mentioned some systems don't even support it at all. I found a datasheet for CMI8738, and they seem to be right at the other end of the scale as legacy PCI chips with *only* an I/O BAR (and so I guess your card includes a PCIe-PCI bridge as well), so are definitely going to be hitting paths that are less well-exercised on arm64 in general.
OK, that makes sense. So If I'm able to find a card that is genuinely PCIe¹, then it should work?
¹ this one has a connector that looks like a PCIe x1, but it's not really PCIe as the chipset was designed for legacy PCI?
What info would I need to gather from the machine in order to figure this out?
The first thing I'd try is rebuilding the kernel with CONFIG_INDIRECT_PIO disabled and see what difference that makes. I'm not too familiar with that area of the code, so the finer details of how to debug broken I/O space beyond that would be more of a linux-pci question.
Tried that, didn't help.
On 07/09/2023 1:41 am, Antonio Terceiro wrote:
On Wed, Sep 06, 2023 at 08:52:40PM +0100, Robin Murphy wrote:
On 2023-09-06 19:36, Antonio Terceiro wrote:
I'm pretty sure I saw reports of people using PCI GPUs on this machine, but I would need to confirm.
GPUs and any other PCIe devices will be fine, since they will use memory BARs - I/O space is pretty much deprecated in PCIe, and as mentioned some systems don't even support it at all. I found a datasheet for CMI8738, and they seem to be right at the other end of the scale as legacy PCI chips with *only* an I/O BAR (and so I guess your card includes a PCIe-PCI bridge as well), so are definitely going to be hitting paths that are less well-exercised on arm64 in general.
OK, that makes sense. So If I'm able to find a card that is genuinely PCIe¹, then it should work?
¹ this one has a connector that looks like a PCIe x1, but it's not really PCIe as the chipset was designed for legacy PCI?
Probably - native PCIe endpoints are still allowed to have I/O resources, but they are required to be accessible as equivalent memory resources as well, so most PCIe drivers are unlikely to care about I/O BARs at all.
What info would I need to gather from the machine in order to figure this out?
The first thing I'd try is rebuilding the kernel with CONFIG_INDIRECT_PIO disabled and see what difference that makes. I'm not too familiar with that area of the code, so the finer details of how to debug broken I/O space beyond that would be more of a linux-pci question.
Tried that, didn't help.
OK, I managed to have a poke around on a full-fat Altra Mt.Jade system, and indeed, at least on this one, the firmware is not describing any I/O space windows at all:
[ 8.657752] pci_bus 0001:00: root bus resource [bus 00-ff] [ 8.663235] pci_bus 0001:00: root bus resource [mem 0x30000000-0x37ffffff window] [ 8.670715] pci_bus 0001:00: root bus resource [mem 0x380000000000-0x3bffdfffffff window] [ 8.678926] pci 0001:00:00.0: [1def:e100] type 00 class 0x060000
[and so on for all 11(!) PCI segments...]
...which then leads to a lot of failing to configure I/O at the bridges:
[ 9.005653] pci 0000:00:01.0: BAR 13: no space for [io size 0x1000] [ 9.012006] pci 0000:00:01.0: BAR 13: failed to assign [io size 0x1000]
...but unfortunately what I don't then have is any endpoint with an I/O BAR in that machine to see how that plays out. Either way, though, if your machine looks the same as this (i.e. does not report any "root bus resource [io ... window]" entries and fails to assign any I/O space), then there's no way that card can work, and it would seem to indicate a bug somewhere between the PCI layer and the driver that it's able to get as far as making an access to something it has no means of accessing.
If on the other hand your firmware is different and *does* claim to have I/O windows as well, then something else is going screwy and I don't know, sorry.
Cheers, Robin.
On Wed, Sep 06, 2023 at 03:36:40PM -0300, Antonio Terceiro wrote:
On Wed, Sep 06, 2023 at 01:49:16PM +0100, Robin Murphy wrote:
It's blowing up trying to access PCI I/O space, which has apparently ended up in the indirect access mechanism without that being configured correctly. That is definitely an issue down somewhere between the PCI layer and the system firmware. Does the system even have an I/O space window? Some arm64 machines don't. I guess we might not have got as far as probing a driver if the I/O BAR couldn't be assigned at all, but either way something's not gone right.
I'm pretty sure I saw reports of people using PCI GPUs on this machine, but I would need to confirm.
What info would I need to gather from the machine in order to figure this out?
Antonio, please see: https://community.amperecomputing.com/t/amd-gpus-on-the-altra-devkit-and-oth...
You have a quirky PCIe controller it seems. You'll have to go through the errata and then some.
Good Luck, Geraldo Nascimento
participants (4)
-
Antonio Terceiro
-
Geraldo Nascimento
-
Robin Murphy
-
Takashi Iwai