Re: [alsa-devel] [PATCH 1/2] ALSA: firewire: process packets in 'struct snd_pcm_ops.ack' callback

13 Jun 2017

On Tue, 13 Jun 2017 00:49:37 +0200,
Takashi Sakamoto wrote:
...
Hi,
On Jun 8 2017 06:20, Takashi Iwai wrote:
...
From: Takashi Iwai tiwai@suse.de
Subject: [PATCH] ALSA: pcm: Suppress status/control mmap when ack ops is
  present
The drivers using PCM ack ops require the notification whenever
appl_ptr is updated in general.  But when the PCM status/control page
is mmapped, this notification doesn't happen, per design, thus it's
not guaranteed to receive the fine-grained updates.
For improving the situation, this patch simply suppresses the PCM
status/control mmap when ack ops is defined.  At least, for all
existing drivers with ack, this should give more benefit.
Once when we really need the full optimization with status/control
mmap even using ack ops, we may reconsider the check, e.g. introducing
a new flag.  But, so far, this should be good enough.
Signed-off-by: Takashi Iwai tiwai@suse.de
sound/core/pcm_native.c | 8 ++++++++
  1 file changed, 8 insertions(+)

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 2bde07a4a87f..b993b0420411 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -3189,6 +3189,10 @@ static int snd_pcm_mmap_status(struct snd_pcm_substream *substream, struct file
   		       struct vm_area_struct *area)
  {
   long size;


/* suppress status/control mmap when driver requires ack */
if (substream->ops->ack)
return -ENXIO;

if (!(area->vm_flags & VM_READ))
return -EINVAL;
size = area->vm_end - area->vm_start;

@@ -3225,6 +3229,10 @@ static int snd_pcm_mmap_control(struct snd_pcm_substream *substream, struct file
   			struct vm_area_struct *area)
  {
   long size;


/* suppress status/control mmap when driver requires ack */
if (substream->ops->ack)
return -ENXIO;

if (!(area->vm_flags & VM_READ))
return -EINVAL;
size = area->vm_end - area->vm_start;

Let me evaluate this patch in a point of overhead at kernel/userspace
interaction.
When considering about shape of ALSA PCM applications, I can show it by
below simple pseudo code:
while:
  poll(2)
  calculate how many PCM frames are available.
  memcpy(2)

To satisfy requirements of some drivers, we need to find a way to take
userspace applications to commit the number of handled PCM frames to
kernel space after the memcpy(2). For this purpose, in ALSA PCM
interface, SNDRV_PCM_IOCTL_SYNC_PTR is available.
while:
  poll(2)
  calculate how many PCM frames are available.
  memcpy(2)
  ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR

Well, as an actual solution, this patch disallows userspace applications
to map page frame for the status/control data of PCM substream, because
alsa-lib has fallback mode for this case to utilize
SNDRV_PCM_IOCTL_SYNC_PTR. This is a similar situation to ARM platform.
As I described in a commit fccf53881e9b ("ALSA: pcm: add 'applptr' event
of tracepoint"), ALSA PCM core doesn't perform page frame mapping for
the status/control data[1].
In the commit message, I compared tracing data of tracepoints and log of
strace. In the case, the number of calls for ioctl(2) with
SNDRV_PCM_IOCTL_SYNC_PTR in a loop with a call of poll(2) is seven. One
of the sevel calls is actually used to commit the number of PCM frames.
There's 6 extra system calls. These 6 calls are just used to check
current status of runtime of PCM substream, and on cache coherent
architecture, they can be suppressed by the page frame mapping. The poll
loop repeats during runtime, thus it's important to analyze overhead of
the extra system calls.
I suggest 3 points to evaluate this patch.

CPU time for the system call
disabling kernel preemption and local IRQ
backward compatibility

Here, I describe my opinion to this patch for each of these points.

CPU time for the system call

System calls cost expensive than usual API calls. As long as
investigated on x86_64 architecture on perf-stat, additional 6 call of
ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR increases 5,000-15,000
instructions in program execution. Recent hardware is enough fast. It
might not be so large disadvantage.
However, when drivers implement 'struct snd_pcm_ops.ack' callback, I
have different opinion; it's too frequent. In either drivers with
indirect layer and packet-oriented drivers, the callback is used to
process PCM frames in intermediate buffer for userspace. The frequent
call of the callback is not enough effective because the applications is
programmed with a manner to be accurate for actual time somehow, thus
they don't process more PCM frames in one poll loop.

disabling kernel preemption and local IRQ

In kernel land, service routine for SNDRV_PCM_IOCTL_SYNC_PTR partly runs
with acquired PCM stream lock, which disables kernel preemption and
local IRQ usually for running CPU. This lock is required because the PCM
stream is handled in any handler for hardware events, but this situation
can reduce realtime performance because no process or service routine for
interrupt can't use the CPU. If we have the other options, it's worth to
reconsider.
Intel SST-based drivers use nonatomic flag for the lock. In this case,
kernel preemption is enabled but IRQ is disabled. I think Intel
developers work is for their embedded solution. The care of realtime
performance might be good for their work, too.

backward compatibility

For backward compatibility, this patch is a simple solution. Even if
patched kernel runs with old version of relevant stuffs, tinyalsa or
alsa-lib, things work well as expected.
However, without this patch, things still work well. In either drivers
with indirect layer or packet-oriented drivers, queued PCM frames are
transferred according to any hardware events. For such drivers, commit
notification from applications just expands their probability to
improve performance; e.g. latency between queueing and transmission.
I think it better to back to your initial direction; adding info flag,
let stuffs in userspace parse it, perform the commit for the number of
handled PCM frames in 'snd_pcm_hw_mmap_commit()'.
Thanks for the analysis.  Yes, the cost by the explicit calls is
known, and it's what was mentioned in the commit log as a further
optimization.  I bought this cost as good enough for better appl_ptr
accuracy, but you thought differently on that.
One thing to be noted is that user-space doesn't have to call sync_ptr
at all, and even no period IRQ is triggered depending on the setup
(e.g. PA prefers it).  This is the case we want to solve.  That is,
the situation is worse than you thought -- things don't work as
expected unless we enforce the sync_ptr notification from user-space.
Then, the question is how to enforce it.  The easiest option is to
disable status/control mmap.  That's how the patch was developed.
As an option,
1) Selectively disable mmap by a new flag, or
2) Selectively disable mmap by the presence of ack callback.
And (2) seems too aggressively applied, from your opinion.
Now you might thing another option:
3) Add a new PCM info flag and let alsa-lib behaving differently
   according to it
But this is no-go, since it doesn't work with the old alsa-lib.
So, IMO, we need to go back to (1), which is my original patch, as a
start.  It affects only the driver that sets (in this case, it's SKL+
driver) and it works with the old user-space, at least.
Then we can improve the performance in alsa-lib.  alsa-lib has some
inefficient implementation regarding the hwptr/appl_ptr update.  In
may places, it calls hwsync, then do avail_update that again calls
sync_ptr, for example.  I guess we can halves the amount of sync_ptr
calls by optimizing that.
Since the sync_ptr is used for all non-x86 architectures (i.e.
nowadays majority of devices in the market), the optimization is a
good benefit for all.  Worth to try, in anyway.
And yet another obvious optimization would be to allow only the status
mmap and disallow the control mmap.  Currently, both are paired and
the control mmap can't fail if the status mmap succeeds.  But, for
the case like this, we want to suppress only the control mmap.
Unfortunately, changing this behavior requires both alsa-lib and
kernel codes.  And keeping the behavior compatibility, we'd need to
introduce something new, e.g. an ioctl to set the compatible protocol
version from alsa-lib.  For now, alsa-lib inquires the protocol
version the kernel supports (SNDRV_PCM_IOCTL_PVERSION).  The newly
required one is the other way round, alsa-lib telling the supported
protocol version to kernel.  Then the kernel can know whether this
alsa-lib version accepts the status-only mmap, and gracefully returns
-ENXIO for the control mmap.
thanks,
Takashi