regression with SG DMA buf allocations and IOMMU in low-mem

Takashi Iwai tiwai at suse.de
Fri Nov 4 16:52:30 CET 2022


On Fri, 04 Nov 2022 16:42:29 +0100,
Kai Vehmanen wrote:
> 
> Hi,
> 
> [ and sorry, I had the bounces address in cc originally, routing
>   reply to list as well ]
> 
> On Fri, 4 Nov 2022, Takashi Iwai wrote:
> 
> > On Fri, 04 Nov 2022 15:56:50 +0100, Kai Vehmanen wrote:
> > > I've been debugging a SOF DSP load fail on VT-D/IOMMU systems [1] and 
> > > that brought me to these two commits from you:
> > > .. but in rare low-memory cases, the dma_alloc_noncontiguous()
> > > call will fail in core/memalloc.c and we go to the fallback path.
> > > 
> > > So we go to snd_dma_sg_fallback_alloc(), but here it seems something is 
> > > missing. The pages are not allocated with DMA mapping API anymore, so 
> > > IOMMU won't know about the memory and in our case the DSP load will fail 
> > > to a IOMMU fault. 
> [...]
> > 
> > Hrm, that's a tough issue.  Basically if dma_alloc_noncontiguous()
> > fails, it's really out of memory -- at least under the situation where
> > IOMMU is enabled.  The fallback path was introduced for the case where
> > there is no IOMMU and noncontiguous allocation fails from the
> > beginning.
> 
> ack. This matches with the bug reports. These happen rarely and on systems 
> with a lot of memory pressure. We have recently submitted a few 
> features&fixes that will further reduce these allocs, so it probably is 
> better outcome to have a plain out of memory error.
> 
> > And, what makes things more complicated is that snd_dma_dev_alloc()
> > cannot be used for SG pages when IOMMU is involved.  We can't get the
> > proper pages for mapping SG from the DMA address in that case; some
> > systems may work with virt_to_page() but it's not guranteed to work.
> 
> Aa, ok, so we need to use dma_alloc_noncontiguous() to do this properly 
> and if it returns ENOMEM, that's what it is then, right.
> 
> > So, I believe there are two issues:
> > 1. The firmware allocation fails with dma_alloc_noncontiguous() with
> >   IOMMU enabled
> > 
> > 2. The helper goes to the fallback path and occasionally it worked,
> >   although the pages can't be used with IOMMU.
> > 
> > Basically when 1 happens, we should just return an error without going
> > to fallback path.  But it won't "fix" your case?
> 
> I think an explicit error would be best. The problem now is that the 
> driver will think the allocation (and mapping to device) is fine and 
> proceeds to program the hardware to use the address. This will then create 
> an IOMMU fault down the line that is not so straighforward to recover from 
> (worst case is that a full device level reset needs to be done). And given 
> driver doesn't know it got a faulty mapping, it's hard to make the 
> decision why the fault happened.

OK, then what I posted in another mail (it went to nirvana) might
work.  Attached below again.


Takashi

-- 8< --
--- a/sound/core/memalloc.c
+++ b/sound/core/memalloc.c
@@ -9,6 +9,7 @@
 #include <linux/slab.h>
 #include <linux/mm.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-map-ops.h>
 #include <linux/genalloc.h>
 #include <linux/highmem.h>
 #include <linux/vmalloc.h>
@@ -541,19 +542,20 @@ static void *snd_dma_noncontig_alloc(struct snd_dma_buffer *dmab, size_t size)
 	struct sg_table *sgt;
 	void *p;
 
-	sgt = dma_alloc_noncontiguous(dmab->dev.dev, size, dmab->dev.dir,
-				      DEFAULT_GFP, 0);
-	if (!sgt) {
 #ifdef CONFIG_SND_DMA_SGBUF
+	if (!get_dma_ops(dmab->dev.dev)) {
 		if (dmab->dev.type == SNDRV_DMA_TYPE_DEV_WC_SG)
 			dmab->dev.type = SNDRV_DMA_TYPE_DEV_WC_SG_FALLBACK;
 		else
 			dmab->dev.type = SNDRV_DMA_TYPE_DEV_SG_FALLBACK;
 		return snd_dma_sg_fallback_alloc(dmab, size);
-#else
-		return NULL;
-#endif
 	}
+#endif
+
+	sgt = dma_alloc_noncontiguous(dmab->dev.dev, size, dmab->dev.dir,
+				      DEFAULT_GFP, 0);
+	if (!sgt)
+		return NULL;
 
 	dmab->dev.need_sync = dma_need_sync(dmab->dev.dev,
 					    sg_dma_address(sgt->sgl));
@@ -857,7 +859,7 @@ static const struct snd_malloc_ops snd_dma_noncoherent_ops = {
 /*
  * Entry points
  */
-static const struct snd_malloc_ops *dma_ops[] = {
+static const struct snd_malloc_ops *snd_dma_ops[] = {
 	[SNDRV_DMA_TYPE_CONTINUOUS] = &snd_dma_continuous_ops,
 	[SNDRV_DMA_TYPE_VMALLOC] = &snd_dma_vmalloc_ops,
 #ifdef CONFIG_HAS_DMA
@@ -883,7 +885,7 @@ static const struct snd_malloc_ops *snd_dma_get_ops(struct snd_dma_buffer *dmab)
 	if (WARN_ON_ONCE(!dmab))
 		return NULL;
 	if (WARN_ON_ONCE(dmab->dev.type <= SNDRV_DMA_TYPE_UNKNOWN ||
-			 dmab->dev.type >= ARRAY_SIZE(dma_ops)))
+			 dmab->dev.type >= ARRAY_SIZE(snd_dma_ops)))
 		return NULL;
-	return dma_ops[dmab->dev.type];
+	return snd_dma_ops[dmab->dev.type];
 }


More information about the Alsa-devel mailing list