[alsa-devel] [PATCH 1/2] ALSA: firewire: process packets in 'struct snd_pcm_ops.ack' callback

Tue Jun 13 14:03:53 CEST 2017

On Tue, 13 Jun 2017 00:49:37 +0200,
Takashi Sakamoto wrote:
> 
> Hi,
> 
> On Jun 8 2017 06:20, Takashi Iwai wrote:
> > From: Takashi Iwai <tiwai at suse.de>
> > Subject: [PATCH] ALSA: pcm: Suppress status/control mmap when ack ops is
> >   present
> >
> > The drivers using PCM ack ops require the notification whenever
> > appl_ptr is updated in general.  But when the PCM status/control page
> > is mmapped, this notification doesn't happen, per design, thus it's
> > not guaranteed to receive the fine-grained updates.
> >
> > For improving the situation, this patch simply suppresses the PCM
> > status/control mmap when ack ops is defined.  At least, for all
> > existing drivers with ack, this should give more benefit.
> >
> > Once when we really need the full optimization with status/control
> > mmap even using ack ops, we may reconsider the check, e.g. introducing
> > a new flag.  But, so far, this should be good enough.
> >
> > Signed-off-by: Takashi Iwai <tiwai at suse.de>
> > ---
> >   sound/core/pcm_native.c | 8 ++++++++
> >   1 file changed, 8 insertions(+)
> >
> > diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> > index 2bde07a4a87f..b993b0420411 100644
> > --- a/sound/core/pcm_native.c
> > +++ b/sound/core/pcm_native.c
> > @@ -3189,6 +3189,10 @@ static int snd_pcm_mmap_status(struct snd_pcm_substream *substream, struct file
> >   			       struct vm_area_struct *area)
> >   {
> >   	long size;
> > +
> > +	/* suppress status/control mmap when driver requires ack */
> > +	if (substream->ops->ack)
> > +		return -ENXIO;
> >   	if (!(area->vm_flags & VM_READ))
> >   		return -EINVAL;
> >   	size = area->vm_end - area->vm_start;
> > @@ -3225,6 +3229,10 @@ static int snd_pcm_mmap_control(struct snd_pcm_substream *substream, struct file
> >   				struct vm_area_struct *area)
> >   {
> >   	long size;
> > +
> > +	/* suppress status/control mmap when driver requires ack */
> > +	if (substream->ops->ack)
> > +		return -ENXIO;
> >   	if (!(area->vm_flags & VM_READ))
> >   		return -EINVAL;
> >   	size = area->vm_end - area->vm_start;
> 
> Let me evaluate this patch in a point of overhead at kernel/userspace
> interaction.
> 
> When considering about shape of ALSA PCM applications, I can show it by
> below simple pseudo code:
> 
> ```
> while:
>   poll(2)
>   calculate how many PCM frames are available.
>   memcpy(2)
> ```
> 
> To satisfy requirements of some drivers, we need to find a way to take
> userspace applications to commit the number of handled PCM frames to
> kernel space after the memcpy(2). For this purpose, in ALSA PCM
> interface, SNDRV_PCM_IOCTL_SYNC_PTR is available.
> 
> ```
> while:
>   poll(2)
>   calculate how many PCM frames are available.
>   memcpy(2)
>   ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR
> ```
> 
> Well, as an actual solution, this patch disallows userspace applications
> to map page frame for the status/control data of PCM substream, because
> alsa-lib has fallback mode for this case to utilize
> SNDRV_PCM_IOCTL_SYNC_PTR. This is a similar situation to ARM platform.
> As I described in a commit fccf53881e9b ("ALSA: pcm: add 'applptr' event
> of tracepoint"), ALSA PCM core doesn't perform page frame mapping for
> the status/control data[1].
> 
> In the commit message, I compared tracing data of tracepoints and log of
> strace. In the case, the number of calls for ioctl(2) with
> SNDRV_PCM_IOCTL_SYNC_PTR in a loop with a call of poll(2) is seven. One
> of the sevel calls is actually used to commit the number of PCM frames.
> There's 6 extra system calls. These 6 calls are just used to check
> current status of runtime of PCM substream, and on cache coherent
> architecture, they can be suppressed by the page frame mapping. The poll
> loop repeats during runtime, thus it's important to analyze overhead of
> the extra system calls.
> 
> I suggest 3 points to evaluate this patch.
> 
> 1. CPU time for the system call
> 2. disabling kernel preemption and local IRQ
> 3. backward compatibility
> 
> Here, I describe my opinion to this patch for each of these points.
> 
> 1. CPU time for the system call
> System calls cost expensive than usual API calls. As long as
> investigated on x86_64 architecture on perf-stat, additional 6 call of
> ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR increases 5,000-15,000
> instructions in program execution. Recent hardware is enough fast. It
> might not be so large disadvantage.
> 
> However, when drivers implement 'struct snd_pcm_ops.ack' callback, I
> have different opinion; it's too frequent. In either drivers with
> indirect layer and packet-oriented drivers, the callback is used to
> process PCM frames in intermediate buffer for userspace. The frequent
> call of the callback is not enough effective because the applications is
> programmed with a manner to be accurate for actual time somehow, thus
> they don't process more PCM frames in one poll loop.
> 
> 2. disabling kernel preemption and local IRQ
> In kernel land, service routine for SNDRV_PCM_IOCTL_SYNC_PTR partly runs
> with acquired PCM stream lock, which disables kernel preemption and
> local IRQ usually for running CPU. This lock is required because the PCM
> stream is handled in any handler for hardware events, but this situation
> can reduce realtime performance because no process or service routine for
> interrupt can't use the CPU. If we have the other options, it's worth to
> reconsider.
> 
> Intel SST-based drivers use nonatomic flag for the lock. In this case,
> kernel preemption is enabled but IRQ is disabled. I think Intel
> developers work is for their embedded solution. The care of realtime
> performance might be good for their work, too.
> 
> 3. backward compatibility
> For backward compatibility, this patch is a simple solution. Even if
> patched kernel runs with old version of relevant stuffs, tinyalsa or
> alsa-lib, things work well as expected.
> 
> However, without this patch, things still work well. In either drivers
> with indirect layer or packet-oriented drivers, queued PCM frames are
> transferred according to any hardware events. For such drivers, commit
> notification from applications just expands their probability to
> improve performance; e.g. latency between queueing and transmission.
> 
> I think it better to back to your initial direction; adding info flag,
> let stuffs in userspace parse it, perform the commit for the number of
> handled PCM frames in 'snd_pcm_hw_mmap_commit()'.

Thanks for the analysis.  Yes, the cost by the explicit calls is
known, and it's what was mentioned in the commit log as a further
optimization.  I bought this cost as good enough for better appl_ptr
accuracy, but you thought differently on that.

One thing to be noted is that user-space doesn't have to call sync_ptr
at all, and even no period IRQ is triggered depending on the setup
(e.g. PA prefers it).  This is the case we want to solve.  That is,
the situation is worse than you thought -- things don't work as
expected unless we enforce the sync_ptr notification from user-space.

Then, the question is how to enforce it.  The easiest option is to
disable status/control mmap.  That's how the patch was developed.
As an option,
1) Selectively disable mmap by a new flag, or
2) Selectively disable mmap by the presence of ack callback.

And (2) seems too aggressively applied, from your opinion.

Now you might thing another option:
3) Add a new PCM info flag and let alsa-lib behaving differently
   according to it

But this is no-go, since it doesn't work with the old alsa-lib.

So, IMO, we need to go back to (1), which is my original patch, as a
start.  It affects only the driver that sets (in this case, it's SKL+
driver) and it works with the old user-space, at least.

Then we can improve the performance in alsa-lib.  alsa-lib has some
inefficient implementation regarding the hwptr/appl_ptr update.  In
may places, it calls hwsync, then do avail_update that again calls
sync_ptr, for example.  I guess we can halves the amount of sync_ptr
calls by optimizing that.

Since the sync_ptr is used for all non-x86 architectures (i.e.
nowadays majority of devices in the market), the optimization is a
good benefit for all.  Worth to try, in anyway.

And yet another obvious optimization would be to allow only the status
mmap and disallow the control mmap.  Currently, both are paired and
the control mmap can't fail if the status mmap succeeds.  But, for
the case like this, we want to suppress only the control mmap.

Unfortunately, changing this behavior requires both alsa-lib and
kernel codes.  And keeping the behavior compatibility, we'd need to
introduce something new, e.g. an ioctl to set the compatible protocol
version from alsa-lib.  For now, alsa-lib inquires the protocol
version the kernel supports (SNDRV_PCM_IOCTL_PVERSION).  The newly
required one is the other way round, alsa-lib telling the supported
protocol version to kernel.  Then the kernel can know whether this
alsa-lib version accepts the status-only mmap, and gracefully returns
-ENXIO for the control mmap.

thanks,

Takashi