On Mon, 8 Jun 2020, Alex Xu (Hello71) wrote:
Excerpts from Christoph Hellwig's message of June 8, 2020 2:19 am:
Can you do a listing using gdb where this happens?
gdb vmlinux
l *(snd_pcm_hw_params+0x3f3)
?
(gdb) l *(snd_pcm_hw_params+0x3f3) 0xffffffff817efc85 is in snd_pcm_hw_params (.../linux/sound/core/pcm_native.c:749). 744 while (runtime->boundary * 2 <= LONG_MAX - runtime->buffer_size) 745 runtime->boundary *= 2; 746 747 /* clear the buffer for avoiding possible kernel info leaks */ 748 if (runtime->dma_area && !substream->ops->copy_user) 749 memset(runtime->dma_area, 0, runtime->dma_bytes); 750 751 snd_pcm_timer_resolution_change(substream); 752 snd_pcm_set_state(substream, SNDRV_PCM_STATE_SETUP); 753
Working theory is that CONFIG_DMA_NONCOHERENT_MMAP getting set is causing the error_code in the page fault path. Debugging with Alex off-thread we found that dma_{alloc,free}_from_pool() are not getting called from the new code in dma_direct_{alloc,free}_pages() and he has not enabled mem_encrypt.
So the issue is related to setting CONFIG_DMA_COHERENT_POOL, and not anything else related to AMD SME. He has a patch to try out, but I wanted to update the thread in case there are other ideas to try other than selecting CONFIG_DMA_NONCOHERENT_MMAP only when CONFIG_DMA_REMAP is set (and not CONFIG_DMA_COHERENT_POOL).