regression with SG DMA buf allocations and IOMMU in low-mem

Kai Vehmanen kai.vehmanen at linux.intel.com
Fri Nov 4 16:42:29 CET 2022


Hi,

[ and sorry, I had the bounces address in cc originally, routing
  reply to list as well ]

On Fri, 4 Nov 2022, Takashi Iwai wrote:

> On Fri, 04 Nov 2022 15:56:50 +0100, Kai Vehmanen wrote:
> > I've been debugging a SOF DSP load fail on VT-D/IOMMU systems [1] and 
> > that brought me to these two commits from you:
> > .. but in rare low-memory cases, the dma_alloc_noncontiguous()
> > call will fail in core/memalloc.c and we go to the fallback path.
> > 
> > So we go to snd_dma_sg_fallback_alloc(), but here it seems something is 
> > missing. The pages are not allocated with DMA mapping API anymore, so 
> > IOMMU won't know about the memory and in our case the DSP load will fail 
> > to a IOMMU fault. 
[...]
> 
> Hrm, that's a tough issue.  Basically if dma_alloc_noncontiguous()
> fails, it's really out of memory -- at least under the situation where
> IOMMU is enabled.  The fallback path was introduced for the case where
> there is no IOMMU and noncontiguous allocation fails from the
> beginning.

ack. This matches with the bug reports. These happen rarely and on systems 
with a lot of memory pressure. We have recently submitted a few 
features&fixes that will further reduce these allocs, so it probably is 
better outcome to have a plain out of memory error.

> And, what makes things more complicated is that snd_dma_dev_alloc()
> cannot be used for SG pages when IOMMU is involved.  We can't get the
> proper pages for mapping SG from the DMA address in that case; some
> systems may work with virt_to_page() but it's not guranteed to work.

Aa, ok, so we need to use dma_alloc_noncontiguous() to do this properly 
and if it returns ENOMEM, that's what it is then, right.

> So, I believe there are two issues:
> 1. The firmware allocation fails with dma_alloc_noncontiguous() with
>   IOMMU enabled
> 
> 2. The helper goes to the fallback path and occasionally it worked,
>   although the pages can't be used with IOMMU.
> 
> Basically when 1 happens, we should just return an error without going
> to fallback path.  But it won't "fix" your case?

I think an explicit error would be best. The problem now is that the 
driver will think the allocation (and mapping to device) is fine and 
proceeds to program the hardware to use the address. This will then create 
an IOMMU fault down the line that is not so straighforward to recover from 
(worst case is that a full device level reset needs to be done). And given 
driver doesn't know it got a faulty mapping, it's hard to make the 
decision why the fault happened.

Br, Kai


More information about the Alsa-devel mailing list