[6.2][regression] after commit ffcb754584603adf7039d7972564fbf6febdc542 all sound devices disappeared (due BUG at mm/page_alloc.c:3592!)
Hi, The kernel 6.2 preparation cycle has begun and yesterday after the kernel was updated on my Fedora Rawhide all audio devices disappeared.
The backtrace of the issue looks like: [ 133.033269] page:00000000e4a2c44b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x207490 [ 133.033353] head:00000000e4a2c44b order:2 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 [ 133.033360] flags: 0x17ffffc0010000(head|node=0|zone=2|lastcpupid=0x1fffff) [ 133.033369] raw: 0017ffffc0010000 0000000000000000 dead000000000122 0000000000000000 [ 133.033376] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 133.033381] page dumped because: VM_BUG_ON_PAGE(PageCompound(page)) [ 133.033392] ------------[ cut here ]------------ [ 133.033397] kernel BUG at mm/page_alloc.c:3592! [ 133.033406] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 133.033410] CPU: 22 PID: 1673 Comm: wireplumber Tainted: G W L ------- --- 6.2.0-0.rc0.20221214gite2ca6ba6ba01.3.fc38.x86_64 #1 [ 133.033415] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 4408 10/28/2022 [ 133.033417] RIP: 0010:split_page+0xa2/0x160 [ 133.033425] Code: 00 48 83 c7 40 48 39 d7 75 d7 0f 1f 44 00 00 89 ee 48 89 df 5b 5d e9 2d fe 06 00 48 c7 c6 d8 ca 9a 95 48 89 df e8 8e 77 fc ff <0f> 0b 48 89 f8 f7 c7 ff 0f 00 00 0f 85 7a ff ff ff 48 8b 17 f7 c2 [ 133.033428] RSP: 0018:ffff9f5645177b98 EFLAGS: 00010286 [ 133.033432] RAX: 0000000000000037 RBX: ffffeb89c81d2400 RCX: 0000000000000000 [ 133.033435] RDX: 0000000000000001 RSI: ffffffff959f0673 RDI: 00000000ffffffff [ 133.033438] RBP: 0000000000000002 R08: 0000000000000000 R09: ffff9f5645177a08 [ 133.033440] R10: 0000000000000003 R11: ffff8d032e2fffe8 R12: 0000000000000007 [ 133.033442] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000001 [ 133.033445] FS: 00007f7e55702800(0000) GS:ffff8d02e8200000(0000) knlGS:0000000000000000 [ 133.033448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 133.033450] CR2: 00007f7e556cb000 CR3: 00000001f604e000 CR4: 0000000000350ee0 [ 133.033453] Call Trace: [ 133.033455] <TASK> [ 133.033458] __iommu_dma_alloc_noncontiguous.constprop.0+0x2de/0x3e0 [ 133.033468] ? rcu_read_lock_sched_held+0x3f/0x80 [ 133.033475] iommu_dma_alloc_noncontiguous+0x66/0xb0 [ 133.033481] dma_alloc_noncontiguous+0x54/0x1a0 [ 133.033489] snd_dma_noncontig_alloc+0x25/0x120 [snd_pcm] [ 133.033505] snd_dma_sg_wc_alloc+0x13/0xb0 [snd_pcm] [ 133.033519] snd_dma_alloc_dir_pages+0x50/0x90 [snd_pcm] [ 133.033532] do_alloc_pages+0x49/0xa0 [snd_pcm] [ 133.033546] snd_pcm_lib_malloc_pages+0xf1/0x1e0 [snd_pcm] [ 133.033560] snd_pcm_hw_params+0x57f/0x620 [snd_pcm] [ 133.033576] snd_pcm_common_ioctl+0x1e4/0x12a0 [snd_pcm] [ 133.033595] snd_pcm_ioctl+0x23/0x40 [snd_pcm] [ 133.033607] __x64_sys_ioctl+0x90/0xd0 [ 133.033613] do_syscall_64+0x5b/0x80 [ 133.033618] ? do_syscall_64+0x67/0x80 [ 133.033622] ? lockdep_hardirqs_on+0x7d/0x100 [ 133.033627] ? do_syscall_64+0x67/0x80 [ 133.033630] ? do_syscall_64+0x67/0x80 [ 133.033633] ? do_syscall_64+0x67/0x80 [ 133.033636] ? do_syscall_64+0x67/0x80 [ 133.033640] ? lockdep_hardirqs_on+0x7d/0x100 [ 133.033644] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 133.033648] RIP: 0033:0x7f7e55b5f65f [ 133.033671] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 133.033674] RSP: 002b:00007ffd24c51ec0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 133.033678] RAX: ffffffffffffffda RBX: 00007ffd24c520f0 RCX: 00007f7e55b5f65f [ 133.033681] RDX: 00007ffd24c520f0 RSI: 00000000c2604111 RDI: 0000000000000023 [ 133.033683] RBP: 0000556c04c4ff60 R08: 0000000000000000 R09: 0000000000000000 [ 133.033685] R10: 0000000000000004 R11: 0000000000000246 R12: 0000556c04c4fee0 [ 133.033688] R13: 00007ffd24c52360 R14: 00007ffd24c527b0 R15: 00007ffd24c520f0 [ 133.033696] </TASK> [ 133.033698] Modules linked in: snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc iwlmvm hid_logitech_hidpp btusb btrtl btbcm snd_seq_midi snd_seq_midi_event btintel btmtk snd_usb_audio bluetooth snd_usbmidi_lib iwlwifi xpad snd_rawmidi ff_memless mc intel_rapl_msr joydev intel_rapl_common snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic snd_hda_codec_hdmi mt76x2u snd_hda_intel kvm_amd snd_intel_dspcfg mt76x2_common snd_intel_sdw_acpi mt76x02_usb snd_hda_codec asus_ec_sensors mt76_usb kvm vfat snd_hda_core fat mt76x02_lib snd_hwdep eeepc_wmi mt76 snd_seq asus_wmi ledtrig_audio snd_seq_device irqbypass sparse_keymap snd_pcm rapl platform_profile wmi_bmof pcspkr snd_timer mac80211 k10temp snd i2c_piix4 soundcore libarc4 acpi_cpufreq [ 133.033777] cfg80211 hid_logitech_dj rfkill zram amdgpu drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul video crc32c_intel polyval_clmulni iommu_v2 gpu_sched polyval_generic drm_buddy nvme ucsi_ccg drm_display_helper typec_ucsi ghash_clmulni_intel ccp igb sha512_ssse3 typec nvme_core sp5100_tco cec dca nvme_common wmi ip6_tables ip_tables fuse [ 133.033832] ---[ end trace 0000000000000000 ]---
I bisected problematic commit and find this: ffcb754584603adf7039d7972564fbf6febdc542 is the first bad commit commit ffcb754584603adf7039d7972564fbf6febdc542 Author: Christoph Hellwig hch@lst.de Date: Wed Nov 9 08:37:17 2022 +0100
dma-mapping: reject __GFP_COMP in dma_alloc_attrs
DMA allocations can never be turned back into a page pointer, so requesting compound pages doesn't make sense and it can't even be supported at all by various backends.
Reject __GFP_COMP with a warning in dma_alloc_attrs, and stop clearing the flag in the arm dma ops and dma-iommu.
Signed-off-by: Christoph Hellwig hch@lst.de Acked-by: Marek Szyprowski m.szyprowski@samsung.com
arch/arm/mm/dma-mapping.c | 17 ----------------- drivers/iommu/dma-iommu.c | 3 --- kernel/dma/mapping.c | 8 ++++++++ 3 files changed, 8 insertions(+), 20 deletions(-)
Reverting this commit and rebuilding the kernel confirmed the correctness of the find.
I hope my report helps fix the problem quickly.
Full kernel log is here: https://pastebin.com/5hsuhifY
Hi,
On Thu, 15 Dec 2022, Mikhail Gavrilov wrote:
The kernel 6.2 preparation cycle has begun and yesterday after the kernel was updated on my Fedora Rawhide all audio devices disappeared.
I can confirm this breaks audio in our SOF tests if I cherry-pick the identified patch ffcb754584603a to sound tree. This affects audio on a very large number of x86 systems.
Br, Kai
Ok, it seems like the sound noncontig alloc code that I already commented on as potentially bogus GFP_GOMP mapping trips this. I think for now the right thing would be to revert the hunk in dma-iommu.c (see patch below). The other thing to try would be to remove both uses GFP_COMP in sound/core/memalloc.c, which should have the same effect.
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 9297b741f5e80e..f798c44e090337 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -744,9 +744,6 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev, /* IOMMU can map any pages, so himem can also be used here */ gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
- /* It makes no sense to muck about with huge pages */ - gfp &= ~__GFP_COMP; - while (count) { struct page *page = NULL; unsigned int order_size;
On 2022-12-16 06:46, Christoph Hellwig wrote:
Ok, it seems like the sound noncontig alloc code that I already commented on as potentially bogus GFP_GOMP mapping trips this. I think for now the right thing would be to revert the hunk in dma-iommu.c (see patch below). The other thing to try would be to remove both uses GFP_COMP in sound/core/memalloc.c, which should have the same effect.
Or we explicitly strip the flag in dma_alloc_noncontiguous() (and maybe dma_alloc_pages() as well) for consistency with dma_alloc_attrs(). That seems like it might be the most robust option.
Robin.
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 9297b741f5e80e..f798c44e090337 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -744,9 +744,6 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev, /* IOMMU can map any pages, so himem can also be used here */ gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
- /* It makes no sense to muck about with huge pages */
- gfp &= ~__GFP_COMP;
- while (count) { struct page *page = NULL; unsigned int order_size;
On Fri, Dec 16, 2022 at 11:40:57AM +0000, Robin Murphy wrote:
On 2022-12-16 06:46, Christoph Hellwig wrote:
Ok, it seems like the sound noncontig alloc code that I already commented on as potentially bogus GFP_GOMP mapping trips this. I think for now the right thing would be to revert the hunk in dma-iommu.c (see patch below). The other thing to try would be to remove both uses GFP_COMP in sound/core/memalloc.c, which should have the same effect.
Or we explicitly strip the flag in dma_alloc_noncontiguous() (and maybe dma_alloc_pages() as well) for consistency with dma_alloc_attrs(). That seems like it might be the most robust option.
In the long run warning there and returning an error seems like the right thing to do, yes. I'm just a little worried doing this right now after the merge window.
On 2022-12-16 12:15, Christoph Hellwig wrote:
On Fri, Dec 16, 2022 at 11:40:57AM +0000, Robin Murphy wrote:
On 2022-12-16 06:46, Christoph Hellwig wrote:
Ok, it seems like the sound noncontig alloc code that I already commented on as potentially bogus GFP_GOMP mapping trips this. I think for now the right thing would be to revert the hunk in dma-iommu.c (see patch below). The other thing to try would be to remove both uses GFP_COMP in sound/core/memalloc.c, which should have the same effect.
Or we explicitly strip the flag in dma_alloc_noncontiguous() (and maybe dma_alloc_pages() as well) for consistency with dma_alloc_attrs(). That seems like it might be the most robust option.
In the long run warning there and returning an error seems like the right thing to do, yes. I'm just a little worried doing this right now after the merge window.
Fair point, I guess nobody else actually implements dma_alloc_noncontiguous(), and dma_alloc_pages() seems a bit of a grey area since it is more of an explicit page allocator. So yeah, just restoring iommu-dma (perhaps with a mild VM_WARN_ON?) seems like a sufficiently safe and sensible fix for the short term. You can have my pre-emptive ack for that.
Cheers, Robin.
[Note: this mail contains only information for Linux kernel regression tracking. Mails like these contain '#forregzbot' in the subject to make then easy to spot and filter out. The author also tried to remove most or all individuals from the list of recipients to spare them the hassle.]
On 15.12.22 15:17, Mikhail Gavrilov wrote:
Hi, The kernel 6.2 preparation cycle has begun and yesterday after the kernel was updated on my Fedora Rawhide all audio devices disappeared.
Thanks for the report. To be sure below issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot:
#regzbot ^introduced ffcb754584603adf #regzbot title dma-mapping: audio devices disappeared #regzbot monitor: https://lore.kernel.org/all/20221220082009.569785-1-hch@lst.de/ #regzbot fix: dma-mapping: reject GFP_COMP for noncohernt allocaions #regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
On 22.12.22 13:17, Thorsten Leemhuis wrote:
[Note: this mail contains only information for Linux kernel regression tracking. Mails like these contain '#forregzbot' in the subject to make then easy to spot and filter out. The author also tried to remove most or all individuals from the list of recipients to spare them the hassle.]
On 15.12.22 15:17, Mikhail Gavrilov wrote:
Hi, The kernel 6.2 preparation cycle has begun and yesterday after the kernel was updated on my Fedora Rawhide all audio devices disappeared.
Thanks for the report. To be sure below issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot:
#regzbot ^introduced ffcb754584603adf #regzbot title dma-mapping: audio devices disappeared #regzbot monitor: https://lore.kernel.org/all/20221220082009.569785-1-hch@lst.de/ #regzbot fix: dma-mapping: reject GFP_COMP for noncohernt allocaions
The typo in the subject of the fix was fixed, hence this is needed:
#regzbot fix: 3622b86f49f8
participants (5)
-
Christoph Hellwig
-
Kai Vehmanen
-
Mikhail Gavrilov
-
Robin Murphy
-
Thorsten Leemhuis