On Tue, 07 Feb 2017 16:44:12 +0100, Takashi Iwai wrote:
On Tue, 07 Feb 2017 14:11:16 +0100, Takashi Iwai wrote:
Another day, another patchset. Here I put a largish cleanup patch to just shuffle the code, in addition to the new S16 and S32 formats support and the support for a single period with no period wakeup as we've discussed in the previous thread. One patch to remove BATCH flag is same and kept in the middle of this series.
I guess I can merge the first three patches now.
It turned out that S16 and S32 causes xrun with dmix by some reason. Still investigating, but we can postpone these, too.
And this gave some interesting result. At first I thought this being a bug in the kernel driver. Then I found out that this is no kernel driver bug but some performance problem of dmix code, appearing only when accessing uncached pages on Cherrytrail. (The same issue may happen likely on other Atom platforms, but need to check.)
By using a generic dmix code with semaphore, this performance problem is resolved. So, S16/S32 supports are OK in the end.
But this leads to another question wrt the kernel driver code: why the driver allocates / maps with uncached page, not with write- combined? Pierre, Jerome, any clue?
The WC is a better choice than UC in general (of course only if possible), and this seems working fine as long as I've tested.
Below is the patch to convert to WC.
BTW, Ian Morrison (Cc'ed) has tested my recent changes intensively, including the BATCH flag removal. His tests were positive, so far. So I'm likely going to merge the stuff into for-next branch in this week until any regression is found.
One odd thing Ian reported is, however, the occasional GPU error (e.g. "Atomic update failure on pipe A"). IIRC, I've seen such GPU issues in the past while debugging, too. This seemed happening with the old branch, and it doesn't look directly related with my changes. My guess is that a fast repeated sequence of audio operations may block the GPU. That is, we might need to add some delay to avoid the too quick operations. I'll check more about this in tomorrow.
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: x86: Use write-combine page attr instead of uncached
WC page attr seems working fine, and it's much better from performance POV.
Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/x86/intel_hdmi_audio.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/sound/x86/intel_hdmi_audio.c b/sound/x86/intel_hdmi_audio.c index ac07c67a41cf..643a302ae883 100644 --- a/sound/x86/intel_hdmi_audio.c +++ b/sound/x86/intel_hdmi_audio.c @@ -1126,9 +1126,9 @@ static int had_pcm_hw_params(struct snd_pcm_substream *substream, /* mark the pages as uncached region */ addr = (unsigned long) substream->runtime->dma_area; pages = (substream->runtime->dma_bytes + PAGE_SIZE - 1) / PAGE_SIZE; - retval = set_memory_uc(addr, pages); + retval = set_memory_wc(addr, pages); if (retval) { - dev_err(intelhaddata->dev, "set_memory_uc failed.Error:%d\n", + dev_err(intelhaddata->dev, "set_memory_wc failed.Error:%d\n", retval); return retval; } @@ -1296,7 +1296,7 @@ static snd_pcm_uframes_t had_pcm_pointer(struct snd_pcm_substream *substream) static int had_pcm_mmap(struct snd_pcm_substream *substream, struct vm_area_struct *vma) { - vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); return remap_pfn_range(vma, vma->vm_start, substream->dma_buffer.addr >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot);