[alsa-devel] [PATCH] ALSA: core: fix unsigned int pages overflow when comapred
we see the below kernel panic on stress suspend resume test in snd_malloc_sgbuf_pages(), snd_dma_alloc_pages_fallback() alloc chunk maybe larger than the left pages due to the pages alignment, which will cause the pages overflow.
while (pages > 0) { ... pages -= chunk; }
the patch is change the pages from unsigned int to int to fix the issue.
BUG: unable to handle kernel paging request at ffff88000deb4000 IP: [<ffffffff81404fa9>] memset_erms+0x9/0x10 Call Trace: [<ffffffff818f222f>] snd_dma_alloc_pages+0xff/0x210 [<ffffffff818f23af>] snd_dma_alloc_pages_fallback+0x6f/0x90 [<ffffffff818f2b85>] snd_malloc_sgbuf_pages+0x145/0x370 [<ffffffff818f229e>] snd_dma_alloc_pages+0x16e/0x210 [<ffffffffc011930d>] hdac_ext_dma_alloc_pages+0x1d/0x40 [snd_hda_ext_core] [<ffffffffc010729a>] snd_hdac_dsp_prepare+0xca/0x1c0 [snd_hda_core] [<ffffffffc01880f9>] skl_dsp_prepare+0x99/0xf0 [snd_soc_skl] [<ffffffffc0162a7e>] bxt_load_base_firmware+0x9e/0x5c0 [snd_soc_skl_ipc] [<ffffffffc01630ec>] bxt_set_dsp_D0+0x14c/0x300 [snd_soc_skl_ipc] [<ffffffffc015f9c3>] skl_dsp_get_core+0x43/0xd0 [snd_soc_skl_ipc] [<ffffffffc015fa60>] skl_dsp_wake+0x10/0x20 [snd_soc_skl_ipc] [<ffffffffc0188e3e>] skl_resume_dsp+0x7e/0x140 [snd_soc_skl] [<ffffffffc0183c4a>] skl_resume+0xda/0x170 [snd_soc_skl] [<ffffffff81452726>] pci_pm_resume+0x76/0xe0 [<ffffffff816616da>] dpm_run_callback+0x5a/0x180 [<ffffffff81661e3c>] device_resume+0xdc/0x2c0 [<ffffffff81663818>] dpm_resume+0x118/0x310 [<ffffffff81663e11>] dpm_resume_end+0x11/0x20 [<ffffffff810f8bcc>] suspend_devices_and_enter+0x11c/0x2b0 [<ffffffff810f90bd>] pm_suspend+0x35d/0x3d0 [<ffffffff810f78a6>] state_store+0x66/0x90 [<ffffffff813f80e2>] kobj_attr_store+0x12/0x20 [<ffffffff812a37bc>] sysfs_kf_write+0x3c/0x50 [<ffffffff812a2cbd>] kernfs_fop_write+0x11d/0x1a0 [<ffffffff8121dfaa>] __vfs_write+0x3a/0x150 [<ffffffff8121f2b1>] vfs_write+0xb1/0x1a0 [<ffffffff81220898>] SyS_write+0x58/0xc0 [<ffffffff81001fca>] do_syscall_64+0x6a/0xe0 [<ffffffff81b06560>] entry_SYSCALL_64_after_swapgs+0x5d/0xd7
Signed-off-by: he, bo bo.he@intel.com Signed-off-by: zhang jun jun.zhang@intel.com --- sound/core/sgbuf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/core/sgbuf.c b/sound/core/sgbuf.c index 84fffab..33449ee 100644 --- a/sound/core/sgbuf.c +++ b/sound/core/sgbuf.c @@ -68,7 +68,8 @@ void *snd_malloc_sgbuf_pages(struct device *device, size_t *res_size) { struct snd_sg_buf *sgbuf; - unsigned int i, pages, chunk, maxpages; + unsigned int i, chunk, maxpages; + int pages; struct snd_dma_buffer tmpb; struct snd_sg_page *table; struct page **pgtable;
On Wed, 18 Jul 2018 13:52:45 +0200, He, Bo wrote:
we see the below kernel panic on stress suspend resume test in snd_malloc_sgbuf_pages(), snd_dma_alloc_pages_fallback() alloc chunk maybe larger than the left pages due to the pages alignment, which will cause the pages overflow.
while (pages > 0) { ... pages -= chunk; }
the patch is change the pages from unsigned int to int to fix the issue.
Thanks for the patch.
Although the analysis is correct, the fix doesn't look ideal. It's also possible that the returned size may over sgbuf->tblsize if we are more unlucky.
A change like below should work instead. Could you give it a try?
Takashi
-- 8< -- --- a/sound/core/sgbuf.c +++ b/sound/core/sgbuf.c @@ -108,7 +108,7 @@ void *snd_malloc_sgbuf_pages(struct device *device, break; } chunk = tmpb.bytes >> PAGE_SHIFT; - for (i = 0; i < chunk; i++) { + for (i = 0; i < chunk && pages > 0; i++) { table->buf = tmpb.area; table->addr = tmpb.addr; if (!i) @@ -117,9 +117,9 @@ void *snd_malloc_sgbuf_pages(struct device *device, *pgtable++ = virt_to_page(tmpb.area); tmpb.area += PAGE_SIZE; tmpb.addr += PAGE_SIZE; + sgbuf->pages++; + pages--; } - sgbuf->pages += chunk; - pages -= chunk; if (chunk < maxpages) maxpages = chunk; }
Thanks, we will run the test with your patch, will update the test results in 24 Hours.
Current status is: We can reproduce the issue in 3000 cycles stress S/R test, we can't reproduce the kernel panic with our patch in 6000 cycles.
-----Original Message----- From: Takashi Iwai tiwai@suse.de Sent: Wednesday, July 18, 2018 8:34 PM To: He, Bo bo.he@intel.com Cc: alsa-devel@alsa-project.org; perex@perex.cz; linux-kernel@vger.kernel.org; Zhang, Jun jun.zhang@intel.com; Zhang, Yanmin yanmin.zhang@intel.com Subject: Re: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred
On Wed, 18 Jul 2018 13:52:45 +0200, He, Bo wrote:
we see the below kernel panic on stress suspend resume test in snd_malloc_sgbuf_pages(), snd_dma_alloc_pages_fallback() alloc chunk maybe larger than the left pages due to the pages alignment, which will cause the pages overflow.
while (pages > 0) { ... pages -= chunk; }
the patch is change the pages from unsigned int to int to fix the issue.
Thanks for the patch.
Although the analysis is correct, the fix doesn't look ideal. It's also possible that the returned size may over sgbuf->tblsize if we are more unlucky.
A change like below should work instead. Could you give it a try?
Takashi
-- 8< -- --- a/sound/core/sgbuf.c +++ b/sound/core/sgbuf.c @@ -108,7 +108,7 @@ void *snd_malloc_sgbuf_pages(struct device *device, break; } chunk = tmpb.bytes >> PAGE_SHIFT; - for (i = 0; i < chunk; i++) { + for (i = 0; i < chunk && pages > 0; i++) { table->buf = tmpb.area; table->addr = tmpb.addr; if (!i) @@ -117,9 +117,9 @@ void *snd_malloc_sgbuf_pages(struct device *device, *pgtable++ = virt_to_page(tmpb.area); tmpb.area += PAGE_SIZE; tmpb.addr += PAGE_SIZE; + sgbuf->pages++; + pages--; } - sgbuf->pages += chunk; - pages -= chunk; if (chunk < maxpages) maxpages = chunk; }
Hello, Takashi
I think use our patch, it's NOT possible that the returned size is over sgbuf->tblsize.
In function snd_malloc_sgbuf_pages,
Pages is align page, sgbuf->tblsize is align 32*page, chunk is align 2^n*page,
in our panic case, pages = 123, tlbsize = 128, 1st loop trunk = 32 2nd loop trunk = 32 3rd loop trunk = 32 4th loop trunk = 16 5th loop trunk = 16 So in 5th loop pages-trunk = -5, which make dead loop.
Use our patch , in 5th loop, while is break. Returned size could NOT be over sgbuf->tblsize.
-----Original Message----- From: Takashi Iwai [mailto:tiwai@suse.de] Sent: Wednesday, July 18, 2018 20:34 To: He, Bo bo.he@intel.com Cc: alsa-devel@alsa-project.org; perex@perex.cz; linux-kernel@vger.kernel.org; Zhang, Jun jun.zhang@intel.com; Zhang, Yanmin yanmin.zhang@intel.com Subject: Re: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred
On Wed, 18 Jul 2018 13:52:45 +0200, He, Bo wrote:
we see the below kernel panic on stress suspend resume test in snd_malloc_sgbuf_pages(), snd_dma_alloc_pages_fallback() alloc chunk maybe larger than the left pages due to the pages alignment, which will cause the pages overflow.
while (pages > 0) { ... pages -= chunk; }
the patch is change the pages from unsigned int to int to fix the issue.
Thanks for the patch.
Although the analysis is correct, the fix doesn't look ideal. It's also possible that the returned size may over sgbuf->tblsize if we are more unlucky.
A change like below should work instead. Could you give it a try?
Takashi
-- 8< -- --- a/sound/core/sgbuf.c +++ b/sound/core/sgbuf.c @@ -108,7 +108,7 @@ void *snd_malloc_sgbuf_pages(struct device *device, break; } chunk = tmpb.bytes >> PAGE_SHIFT; - for (i = 0; i < chunk; i++) { + for (i = 0; i < chunk && pages > 0; i++) { table->buf = tmpb.area; table->addr = tmpb.addr; if (!i) @@ -117,9 +117,9 @@ void *snd_malloc_sgbuf_pages(struct device *device, *pgtable++ = virt_to_page(tmpb.area); tmpb.area += PAGE_SIZE; tmpb.addr += PAGE_SIZE; + sgbuf->pages++; + pages--; } - sgbuf->pages += chunk; - pages -= chunk; if (chunk < maxpages) maxpages = chunk; }
On Thu, 19 Jul 2018 08:08:06 +0200, Zhang, Jun wrote:
Hello, Takashi
I think use our patch, it's NOT possible that the returned size is over sgbuf->tblsize.
In function snd_malloc_sgbuf_pages,
Pages is align page, sgbuf->tblsize is align 32*page, chunk is align 2^n*page,
in our panic case, pages = 123, tlbsize = 128, 1st loop trunk = 32 2nd loop trunk = 32 3rd loop trunk = 32 4th loop trunk = 16 5th loop trunk = 16 So in 5th loop pages-trunk = -5, which make dead loop.
Looking at the code again, yeah, you are right, that won't happen.
And now it becomes clear: the fundamental problem is that snd_dma_alloc_pages_fallback() returns a larger size than requested. It would be acceptable if the internal allocator aligns a larger size, but it shouldn't appear in the returned size outside. I believe this was just a misunderstanding of get_order() usage there. (BTW, it's interesting that the allocation with a larger block worked while allocation with a smaller chunk failed; it must be a rare case and that's one of reasons this bug didn't hit frequently.)
That being said, what we should fix is rather the function snd_dma_alloc_pages_fallback() to behave as expected, and it'll be like the patch below.
thanks,
Takashi
--- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -247,11 +247,10 @@ int snd_dma_alloc_pages_fallback(int type, struct device *device, size_t size, return err; if (size <= PAGE_SIZE) return -ENOMEM; + size >>= 1; aligned_size = PAGE_SIZE << get_order(size); if (size != aligned_size) size = aligned_size; - else - size >>= 1; } if (! dmab->area) return -ENOMEM;
On Thu, 19 Jul 2018 08:42:14 +0200, Takashi Iwai wrote:
On Thu, 19 Jul 2018 08:08:06 +0200, Zhang, Jun wrote:
Hello, Takashi
I think use our patch, it's NOT possible that the returned size is over sgbuf->tblsize.
In function snd_malloc_sgbuf_pages,
Pages is align page, sgbuf->tblsize is align 32*page, chunk is align 2^n*page,
in our panic case, pages = 123, tlbsize = 128, 1st loop trunk = 32 2nd loop trunk = 32 3rd loop trunk = 32 4th loop trunk = 16 5th loop trunk = 16 So in 5th loop pages-trunk = -5, which make dead loop.
Looking at the code again, yeah, you are right, that won't happen.
And now it becomes clear: the fundamental problem is that snd_dma_alloc_pages_fallback() returns a larger size than requested. It would be acceptable if the internal allocator aligns a larger size, but it shouldn't appear in the returned size outside. I believe this was just a misunderstanding of get_order() usage there. (BTW, it's interesting that the allocation with a larger block worked while allocation with a smaller chunk failed; it must be a rare case and that's one of reasons this bug didn't hit frequently.)
That being said, what we should fix is rather the function snd_dma_alloc_pages_fallback() to behave as expected, and it'll be like the patch below.
And we can reduce even more lines. A proper patch is below.
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: memalloc: Don't exceed over the requested size
snd_dma_alloc_pages_fallback() tries to allocate pages again when the allocation fails with reduced size. But the first try actually *increases* the size to power-of-two, which may give back a larger chunk than the requested size. This confuses the callers, e.g. sgbuf assumes that the size is equal or less, and it may result in a bad loop due to the underflow and eventually lead to Oops.
The code of this function seems incorrectly assuming the usage of get_order(). We need to decrease at first, then align to power-of-two.
Reported-by: he, bo bo.he@intel.com Reported-by: zhang jun jun.zhang@intel.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/core/memalloc.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c index 7f89d3c79a4b..753d5fc4b284 100644 --- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -242,16 +242,12 @@ int snd_dma_alloc_pages_fallback(int type, struct device *device, size_t size, int err;
while ((err = snd_dma_alloc_pages(type, device, size, dmab)) < 0) { - size_t aligned_size; if (err != -ENOMEM) return err; if (size <= PAGE_SIZE) return -ENOMEM; - aligned_size = PAGE_SIZE << get_order(size); - if (size != aligned_size) - size = aligned_size; - else - size >>= 1; + size >>= 1; + size = PAGE_SIZE << get_order(size); } if (! dmab->area) return -ENOMEM;
Hi, Takashi: we tested for the whole weekend, your patch works, no panic issue seen. You can safe merge you patch.
-----Original Message----- From: Takashi Iwai tiwai@suse.de Sent: Thursday, July 19, 2018 5:11 PM To: Zhang, Jun jun.zhang@intel.com Cc: He, Bo bo.he@intel.com; alsa-devel@alsa-project.org; perex@perex.cz; linux-kernel@vger.kernel.org; Zhang, Yanmin yanmin.zhang@intel.com Subject: Re: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred
On Thu, 19 Jul 2018 08:42:14 +0200, Takashi Iwai wrote:
On Thu, 19 Jul 2018 08:08:06 +0200, Zhang, Jun wrote:
Hello, Takashi
I think use our patch, it's NOT possible that the returned size is over sgbuf->tblsize.
In function snd_malloc_sgbuf_pages,
Pages is align page, sgbuf->tblsize is align 32*page, chunk is align 2^n*page,
in our panic case, pages = 123, tlbsize = 128, 1st loop trunk = 32 2nd loop trunk = 32 3rd loop trunk = 32 4th loop trunk = 16 5th loop trunk = 16 So in 5th loop pages-trunk = -5, which make dead loop.
Looking at the code again, yeah, you are right, that won't happen.
And now it becomes clear: the fundamental problem is that snd_dma_alloc_pages_fallback() returns a larger size than requested. It would be acceptable if the internal allocator aligns a larger size, but it shouldn't appear in the returned size outside. I believe this was just a misunderstanding of get_order() usage there. (BTW, it's interesting that the allocation with a larger block worked while allocation with a smaller chunk failed; it must be a rare case and that's one of reasons this bug didn't hit frequently.)
That being said, what we should fix is rather the function snd_dma_alloc_pages_fallback() to behave as expected, and it'll be like the patch below.
And we can reduce even more lines. A proper patch is below.
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: memalloc: Don't exceed over the requested size
snd_dma_alloc_pages_fallback() tries to allocate pages again when the allocation fails with reduced size. But the first try actually *increases* the size to power-of-two, which may give back a larger chunk than the requested size. This confuses the callers, e.g. sgbuf assumes that the size is equal or less, and it may result in a bad loop due to the underflow and eventually lead to Oops.
The code of this function seems incorrectly assuming the usage of get_order(). We need to decrease at first, then align to power-of-two.
Reported-by: he, bo bo.he@intel.com Reported-by: zhang jun jun.zhang@intel.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/core/memalloc.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c index 7f89d3c79a4b..753d5fc4b284 100644 --- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -242,16 +242,12 @@ int snd_dma_alloc_pages_fallback(int type, struct device *device, size_t size, int err;
while ((err = snd_dma_alloc_pages(type, device, size, dmab)) < 0) { - size_t aligned_size; if (err != -ENOMEM) return err; if (size <= PAGE_SIZE) return -ENOMEM; - aligned_size = PAGE_SIZE << get_order(size); - if (size != aligned_size) - size = aligned_size; - else - size >>= 1; + size >>= 1; + size = PAGE_SIZE << get_order(size); } if (! dmab->area) return -ENOMEM; -- 2.18.0
On Mon, 23 Jul 2018 02:47:18 +0200, He, Bo wrote:
Hi, Takashi: we tested for the whole weekend, your patch works, no panic issue seen. You can safe merge you patch.
OK, thanks for testing! Now it's merged.
Takashi
-----Original Message----- From: Takashi Iwai tiwai@suse.de Sent: Thursday, July 19, 2018 5:11 PM To: Zhang, Jun jun.zhang@intel.com Cc: He, Bo bo.he@intel.com; alsa-devel@alsa-project.org; perex@perex.cz; linux-kernel@vger.kernel.org; Zhang, Yanmin yanmin.zhang@intel.com Subject: Re: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred
On Thu, 19 Jul 2018 08:42:14 +0200, Takashi Iwai wrote:
On Thu, 19 Jul 2018 08:08:06 +0200, Zhang, Jun wrote:
Hello, Takashi
I think use our patch, it's NOT possible that the returned size is over sgbuf->tblsize.
In function snd_malloc_sgbuf_pages,
Pages is align page, sgbuf->tblsize is align 32*page, chunk is align 2^n*page,
in our panic case, pages = 123, tlbsize = 128, 1st loop trunk = 32 2nd loop trunk = 32 3rd loop trunk = 32 4th loop trunk = 16 5th loop trunk = 16 So in 5th loop pages-trunk = -5, which make dead loop.
Looking at the code again, yeah, you are right, that won't happen.
And now it becomes clear: the fundamental problem is that snd_dma_alloc_pages_fallback() returns a larger size than requested. It would be acceptable if the internal allocator aligns a larger size, but it shouldn't appear in the returned size outside. I believe this was just a misunderstanding of get_order() usage there. (BTW, it's interesting that the allocation with a larger block worked while allocation with a smaller chunk failed; it must be a rare case and that's one of reasons this bug didn't hit frequently.)
That being said, what we should fix is rather the function snd_dma_alloc_pages_fallback() to behave as expected, and it'll be like the patch below.
And we can reduce even more lines. A proper patch is below.
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: memalloc: Don't exceed over the requested size
snd_dma_alloc_pages_fallback() tries to allocate pages again when the allocation fails with reduced size. But the first try actually *increases* the size to power-of-two, which may give back a larger chunk than the requested size. This confuses the callers, e.g. sgbuf assumes that the size is equal or less, and it may result in a bad loop due to the underflow and eventually lead to Oops.
The code of this function seems incorrectly assuming the usage of get_order(). We need to decrease at first, then align to power-of-two.
Reported-by: he, bo bo.he@intel.com Reported-by: zhang jun jun.zhang@intel.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de
sound/core/memalloc.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c index 7f89d3c79a4b..753d5fc4b284 100644 --- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -242,16 +242,12 @@ int snd_dma_alloc_pages_fallback(int type, struct device *device, size_t size, int err;
while ((err = snd_dma_alloc_pages(type, device, size, dmab)) < 0) {
if (err != -ENOMEM) return err; if (size <= PAGE_SIZE) return -ENOMEM;size_t aligned_size;
aligned_size = PAGE_SIZE << get_order(size);
if (size != aligned_size)
size = aligned_size;
else
size >>= 1;
size >>= 1;
} if (! dmab->area) return -ENOMEM;size = PAGE_SIZE << get_order(size);
-- 2.18.0
participants (3)
-
He, Bo
-
Takashi Iwai
-
Zhang, Jun