[alsa-devel] [PATCH 0/3] ALSA: firewire: use .ack callback of PCM operation for packet processing
Hi,
In recent work for ALSA PCM core, driver-side implementation for 'struct snd_pcm_ops.ack' is called when appl_ptr in intermediate PCM buffer is changed, except for a case that mapped page frame is used for the buffer and userspace applications doesn't voluntarily notify the change.
In packet-oriented drivers, this callback is useful for the timing to queue packets for data transmission with burstness. If things works well, reduction of latency between PCM frame queueing and actual data transmission. This patchset is for the purpose, additionally including minor code refactoring.
Major use case of ALSA PCM interface is memory mapped I/O and this patchset adds no advantage in this case due to design of the .ack callback, as I described.
Takashi Sakamoto (2): ALSA: firewire: process packets in 'struct snd_pcm_ops.ack' callback ALSA: fireface: constify ALSA specific operations
sound/firewire/amdtp-stream.c | 19 ++++++++++ sound/firewire/amdtp-stream.h | 1 + sound/firewire/bebob/bebob_pcm.c | 16 +++++++++ sound/firewire/dice/dice-pcm.c | 18 ++++++++++ sound/firewire/digi00x/digi00x-pcm.c | 16 +++++++++ sound/firewire/fireface/ff-midi.c | 22 ++++++------ sound/firewire/fireface/ff-pcm.c | 62 +++++++++++++++++++------------- sound/firewire/fireworks/fireworks_pcm.c | 16 +++++++++ sound/firewire/motu/motu-pcm.c | 16 +++++++++ sound/firewire/oxfw/oxfw-pcm.c | 16 +++++++++ sound/firewire/tascam/tascam-pcm.c | 16 +++++++++ 11 files changed, 182 insertions(+), 36 deletions(-)
In recent commit for ALSA PCM core, some arrangement is done for 'struct snd_pcm_ops.ack' callback. This is called when appl_ptr is explicitly moved in intermediate buffer for PCM frames, except for some cases described later.
For drivers in ALSA firewire stack, usage of this callback has a merit to reduce latency between time of PCM frame queueing and handling actual packets in recent isochronous cycle, because no need to wait for software IRQ context from isochronous context of OHCI 1394.
If this works well in a case that mapped page frame is used for the intermediate buffer, user process should execute some commands for ioctl(2) to tell the number of handled PCM frames in the intermediate buffer just after handling them. Therefore, at present, with a combination of below conditions, this doesn't work as expected and user process should wait for the software IRQ context as usual: - when ALSA PCM core judges page frame mapping is available for status data (struct snd_pcm_mmap_status) and control data (struct snd_pcm_mmap_control). - user process handles PCM frames by loop just with 'snd_pcm_mmap_begin()' and 'snd_pcm_mmap_commit()'. - user process uses PCM hw plugin in alsa-lib to operate I/O without 'sync_ptr_ioctl' option.
Unfortunately, major use case include these three conditions.
Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp --- sound/firewire/amdtp-stream.c | 19 +++++++++++++++++++ sound/firewire/amdtp-stream.h | 1 + sound/firewire/bebob/bebob_pcm.c | 16 ++++++++++++++++ sound/firewire/dice/dice-pcm.c | 18 ++++++++++++++++++ sound/firewire/digi00x/digi00x-pcm.c | 16 ++++++++++++++++ sound/firewire/fireface/ff-pcm.c | 16 ++++++++++++++++ sound/firewire/fireworks/fireworks_pcm.c | 16 ++++++++++++++++ sound/firewire/motu/motu-pcm.c | 16 ++++++++++++++++ sound/firewire/oxfw/oxfw-pcm.c | 16 ++++++++++++++++ sound/firewire/tascam/tascam-pcm.c | 16 ++++++++++++++++ 10 files changed, 150 insertions(+)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c index 9e6f54f8c45d..4316d9db404d 100644 --- a/sound/firewire/amdtp-stream.c +++ b/sound/firewire/amdtp-stream.c @@ -929,6 +929,25 @@ unsigned long amdtp_stream_pcm_pointer(struct amdtp_stream *s) EXPORT_SYMBOL(amdtp_stream_pcm_pointer);
/** + * amdtp_stream_pcm_ack - acknowledge queued PCM frames + * @s: the AMDTP stream that transfers the PCM frames + * + * Returns zero always. + */ +int amdtp_stream_pcm_ack(struct amdtp_stream *s) +{ + /* + * Process isochronous packets for recent isochronous cycle to handle + * queued PCM frames. + */ + if (amdtp_stream_running(s)) + fw_iso_context_flush_completions(s->context); + + return 0; +} +EXPORT_SYMBOL(amdtp_stream_pcm_ack); + +/** * amdtp_stream_update - update the stream after a bus reset * @s: the AMDTP stream */ diff --git a/sound/firewire/amdtp-stream.h b/sound/firewire/amdtp-stream.h index 7e8831722821..6d613f2eb612 100644 --- a/sound/firewire/amdtp-stream.h +++ b/sound/firewire/amdtp-stream.h @@ -168,6 +168,7 @@ int amdtp_stream_add_pcm_hw_constraints(struct amdtp_stream *s,
void amdtp_stream_pcm_prepare(struct amdtp_stream *s); unsigned long amdtp_stream_pcm_pointer(struct amdtp_stream *s); +int amdtp_stream_pcm_ack(struct amdtp_stream *s); void amdtp_stream_pcm_abort(struct amdtp_stream *s);
extern const unsigned int amdtp_syt_intervals[CIP_SFC_COUNT]; diff --git a/sound/firewire/bebob/bebob_pcm.c b/sound/firewire/bebob/bebob_pcm.c index e2f023f3cd9f..657e15a27e5c 100644 --- a/sound/firewire/bebob/bebob_pcm.c +++ b/sound/firewire/bebob/bebob_pcm.c @@ -355,6 +355,20 @@ pcm_playback_pointer(struct snd_pcm_substream *sbstrm) return amdtp_stream_pcm_pointer(&bebob->rx_stream); }
+static int pcm_capture_ack(struct snd_pcm_substream *substream) +{ + struct snd_bebob *bebob = substream->private_data; + + return amdtp_stream_pcm_ack(&bebob->tx_stream); +} + +static int pcm_playback_ack(struct snd_pcm_substream *substream) +{ + struct snd_bebob *bebob = substream->private_data; + + return amdtp_stream_pcm_ack(&bebob->rx_stream); +} + int snd_bebob_create_pcm_devices(struct snd_bebob *bebob) { static const struct snd_pcm_ops capture_ops = { @@ -366,6 +380,7 @@ int snd_bebob_create_pcm_devices(struct snd_bebob *bebob) .prepare = pcm_capture_prepare, .trigger = pcm_capture_trigger, .pointer = pcm_capture_pointer, + .ack = pcm_capture_ack, .page = snd_pcm_lib_get_vmalloc_page, }; static const struct snd_pcm_ops playback_ops = { @@ -377,6 +392,7 @@ int snd_bebob_create_pcm_devices(struct snd_bebob *bebob) .prepare = pcm_playback_prepare, .trigger = pcm_playback_trigger, .pointer = pcm_playback_pointer, + .ack = pcm_playback_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, }; diff --git a/sound/firewire/dice/dice-pcm.c b/sound/firewire/dice/dice-pcm.c index b633805a885f..2dda74695069 100644 --- a/sound/firewire/dice/dice-pcm.c +++ b/sound/firewire/dice/dice-pcm.c @@ -294,6 +294,22 @@ static snd_pcm_uframes_t playback_pointer(struct snd_pcm_substream *substream) return amdtp_stream_pcm_pointer(stream); }
+static int capture_ack(struct snd_pcm_substream *substream) +{ + struct snd_dice *dice = substream->private_data; + struct amdtp_stream *stream = &dice->tx_stream[substream->pcm->device]; + + return amdtp_stream_pcm_ack(stream); +} + +static int playback_ack(struct snd_pcm_substream *substream) +{ + struct snd_dice *dice = substream->private_data; + struct amdtp_stream *stream = &dice->rx_stream[substream->pcm->device]; + + return amdtp_stream_pcm_ack(stream); +} + int snd_dice_create_pcm(struct snd_dice *dice) { static const struct snd_pcm_ops capture_ops = { @@ -305,6 +321,7 @@ int snd_dice_create_pcm(struct snd_dice *dice) .prepare = capture_prepare, .trigger = capture_trigger, .pointer = capture_pointer, + .ack = capture_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, }; @@ -317,6 +334,7 @@ int snd_dice_create_pcm(struct snd_dice *dice) .prepare = playback_prepare, .trigger = playback_trigger, .pointer = playback_pointer, + .ack = playback_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, }; diff --git a/sound/firewire/digi00x/digi00x-pcm.c b/sound/firewire/digi00x/digi00x-pcm.c index 6379f67416d7..f76cf5e383db 100644 --- a/sound/firewire/digi00x/digi00x-pcm.c +++ b/sound/firewire/digi00x/digi00x-pcm.c @@ -324,6 +324,20 @@ static snd_pcm_uframes_t pcm_playback_pointer(struct snd_pcm_substream *sbstrm) return amdtp_stream_pcm_pointer(&dg00x->rx_stream); }
+static int pcm_capture_ack(struct snd_pcm_substream *substream) +{ + struct snd_dg00x *dg00x = substream->private_data; + + return amdtp_stream_pcm_ack(&dg00x->tx_stream); +} + +static int pcm_playback_ack(struct snd_pcm_substream *substream) +{ + struct snd_dg00x *dg00x = substream->private_data; + + return amdtp_stream_pcm_ack(&dg00x->rx_stream); +} + int snd_dg00x_create_pcm_devices(struct snd_dg00x *dg00x) { static const struct snd_pcm_ops capture_ops = { @@ -335,6 +349,7 @@ int snd_dg00x_create_pcm_devices(struct snd_dg00x *dg00x) .prepare = pcm_capture_prepare, .trigger = pcm_capture_trigger, .pointer = pcm_capture_pointer, + .ack = pcm_capture_ack, .page = snd_pcm_lib_get_vmalloc_page, }; static const struct snd_pcm_ops playback_ops = { @@ -346,6 +361,7 @@ int snd_dg00x_create_pcm_devices(struct snd_dg00x *dg00x) .prepare = pcm_playback_prepare, .trigger = pcm_playback_trigger, .pointer = pcm_playback_pointer, + .ack = pcm_playback_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, }; diff --git a/sound/firewire/fireface/ff-pcm.c b/sound/firewire/fireface/ff-pcm.c index 93cee1978e8e..adb5c87f492f 100644 --- a/sound/firewire/fireface/ff-pcm.c +++ b/sound/firewire/fireface/ff-pcm.c @@ -365,6 +365,20 @@ static snd_pcm_uframes_t pcm_playback_pointer(struct snd_pcm_substream *sbstrm) return amdtp_stream_pcm_pointer(&ff->rx_stream); }
+static int pcm_capture_ack(struct snd_pcm_substream *substream) +{ + struct snd_ff *ff = substream->private_data; + + return amdtp_stream_pcm_ack(&ff->tx_stream); +} + +static int pcm_playback_ack(struct snd_pcm_substream *substream) +{ + struct snd_ff *ff = substream->private_data; + + return amdtp_stream_pcm_ack(&ff->rx_stream); +} + static struct snd_pcm_ops pcm_capture_ops = { .open = pcm_open, .close = pcm_close, @@ -374,6 +388,7 @@ static struct snd_pcm_ops pcm_capture_ops = { .prepare = pcm_capture_prepare, .trigger = pcm_capture_trigger, .pointer = pcm_capture_pointer, + .ack = pcm_capture_ack, .page = snd_pcm_lib_get_vmalloc_page, };
@@ -386,6 +401,7 @@ static struct snd_pcm_ops pcm_playback_ops = { .prepare = pcm_playback_prepare, .trigger = pcm_playback_trigger, .pointer = pcm_playback_pointer, + .ack = pcm_playback_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, }; diff --git a/sound/firewire/fireworks/fireworks_pcm.c b/sound/firewire/fireworks/fireworks_pcm.c index f10aec117998..346e2647ed1f 100644 --- a/sound/firewire/fireworks/fireworks_pcm.c +++ b/sound/firewire/fireworks/fireworks_pcm.c @@ -379,6 +379,20 @@ static snd_pcm_uframes_t pcm_playback_pointer(struct snd_pcm_substream *sbstrm) return amdtp_stream_pcm_pointer(&efw->rx_stream); }
+static int pcm_capture_ack(struct snd_pcm_substream *substream) +{ + struct snd_efw *efw = substream->private_data; + + return amdtp_stream_pcm_ack(&efw->tx_stream); +} + +static int pcm_playback_ack(struct snd_pcm_substream *substream) +{ + struct snd_efw *efw = substream->private_data; + + return amdtp_stream_pcm_ack(&efw->rx_stream); +} + int snd_efw_create_pcm_devices(struct snd_efw *efw) { static const struct snd_pcm_ops capture_ops = { @@ -390,6 +404,7 @@ int snd_efw_create_pcm_devices(struct snd_efw *efw) .prepare = pcm_capture_prepare, .trigger = pcm_capture_trigger, .pointer = pcm_capture_pointer, + .ack = pcm_capture_ack, .page = snd_pcm_lib_get_vmalloc_page, }; static const struct snd_pcm_ops playback_ops = { @@ -401,6 +416,7 @@ int snd_efw_create_pcm_devices(struct snd_efw *efw) .prepare = pcm_playback_prepare, .trigger = pcm_playback_trigger, .pointer = pcm_playback_pointer, + .ack = pcm_playback_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, }; diff --git a/sound/firewire/motu/motu-pcm.c b/sound/firewire/motu/motu-pcm.c index 94558f3d218b..e3ef89cee565 100644 --- a/sound/firewire/motu/motu-pcm.c +++ b/sound/firewire/motu/motu-pcm.c @@ -356,6 +356,20 @@ static snd_pcm_uframes_t playback_pointer(struct snd_pcm_substream *substream) return amdtp_stream_pcm_pointer(&motu->rx_stream); }
+static int capture_ack(struct snd_pcm_substream *substream) +{ + struct snd_motu *motu = substream->private_data; + + return amdtp_stream_pcm_ack(&motu->tx_stream); +} + +static int playback_ack(struct snd_pcm_substream *substream) +{ + struct snd_motu *motu = substream->private_data; + + return amdtp_stream_pcm_ack(&motu->rx_stream); +} + int snd_motu_create_pcm_devices(struct snd_motu *motu) { static struct snd_pcm_ops capture_ops = { @@ -367,6 +381,7 @@ int snd_motu_create_pcm_devices(struct snd_motu *motu) .prepare = capture_prepare, .trigger = capture_trigger, .pointer = capture_pointer, + .ack = capture_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, }; @@ -379,6 +394,7 @@ int snd_motu_create_pcm_devices(struct snd_motu *motu) .prepare = playback_prepare, .trigger = playback_trigger, .pointer = playback_pointer, + .ack = playback_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, }; diff --git a/sound/firewire/oxfw/oxfw-pcm.c b/sound/firewire/oxfw/oxfw-pcm.c index d4594f7115ae..bc1a3a36ab06 100644 --- a/sound/firewire/oxfw/oxfw-pcm.c +++ b/sound/firewire/oxfw/oxfw-pcm.c @@ -382,6 +382,20 @@ static snd_pcm_uframes_t pcm_playback_pointer(struct snd_pcm_substream *sbstm) return amdtp_stream_pcm_pointer(&oxfw->rx_stream); }
+static int pcm_capture_ack(struct snd_pcm_substream *substream) +{ + struct snd_oxfw *oxfw = substream->private_data; + + return amdtp_stream_pcm_ack(&oxfw->tx_stream); +} + +static int pcm_playback_ack(struct snd_pcm_substream *substream) +{ + struct snd_oxfw *oxfw = substream->private_data; + + return amdtp_stream_pcm_ack(&oxfw->rx_stream); +} + int snd_oxfw_create_pcm(struct snd_oxfw *oxfw) { static const struct snd_pcm_ops capture_ops = { @@ -393,6 +407,7 @@ int snd_oxfw_create_pcm(struct snd_oxfw *oxfw) .prepare = pcm_capture_prepare, .trigger = pcm_capture_trigger, .pointer = pcm_capture_pointer, + .ack = pcm_capture_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, }; @@ -405,6 +420,7 @@ int snd_oxfw_create_pcm(struct snd_oxfw *oxfw) .prepare = pcm_playback_prepare, .trigger = pcm_playback_trigger, .pointer = pcm_playback_pointer, + .ack = pcm_playback_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, }; diff --git a/sound/firewire/tascam/tascam-pcm.c b/sound/firewire/tascam/tascam-pcm.c index 6207588d7c73..3c4482aa7231 100644 --- a/sound/firewire/tascam/tascam-pcm.c +++ b/sound/firewire/tascam/tascam-pcm.c @@ -263,6 +263,20 @@ static snd_pcm_uframes_t pcm_playback_pointer(struct snd_pcm_substream *sbstrm) return amdtp_stream_pcm_pointer(&tscm->rx_stream); }
+static int pcm_capture_ack(struct snd_pcm_substream *substream) +{ + struct snd_tscm *tscm = substream->private_data; + + return amdtp_stream_pcm_ack(&tscm->tx_stream); +} + +static int pcm_playback_ack(struct snd_pcm_substream *substream) +{ + struct snd_tscm *tscm = substream->private_data; + + return amdtp_stream_pcm_ack(&tscm->rx_stream); +} + int snd_tscm_create_pcm_devices(struct snd_tscm *tscm) { static const struct snd_pcm_ops capture_ops = { @@ -274,6 +288,7 @@ int snd_tscm_create_pcm_devices(struct snd_tscm *tscm) .prepare = pcm_capture_prepare, .trigger = pcm_capture_trigger, .pointer = pcm_capture_pointer, + .ack = pcm_capture_ack, .page = snd_pcm_lib_get_vmalloc_page, }; static const struct snd_pcm_ops playback_ops = { @@ -285,6 +300,7 @@ int snd_tscm_create_pcm_devices(struct snd_tscm *tscm) .prepare = pcm_playback_prepare, .trigger = pcm_playback_trigger, .pointer = pcm_playback_pointer, + .ack = pcm_playback_ack, .page = snd_pcm_lib_get_vmalloc_page, .mmap = snd_pcm_lib_mmap_vmalloc, };
On Wed, 07 Jun 2017 02:38:05 +0200, Takashi Sakamoto wrote:
In recent commit for ALSA PCM core, some arrangement is done for 'struct snd_pcm_ops.ack' callback. This is called when appl_ptr is explicitly moved in intermediate buffer for PCM frames, except for some cases described later.
For drivers in ALSA firewire stack, usage of this callback has a merit to reduce latency between time of PCM frame queueing and handling actual packets in recent isochronous cycle, because no need to wait for software IRQ context from isochronous context of OHCI 1394.
If this works well in a case that mapped page frame is used for the intermediate buffer, user process should execute some commands for ioctl(2) to tell the number of handled PCM frames in the intermediate buffer just after handling them.
This is one thing that was raised in the discussion with Intel people, and my suggestion was to add a new flag to suppress the status/control mmap like pcm_file->no_compat_mmap. Then alsa-lib falls back to the sync_ptr ioctl, and the driver can catch each appl_ptr update.
In anyway, I applied the patch now as is. Thanks.
Takashi
On Wed, 07 Jun 2017 07:59:20 +0200, Takashi Iwai wrote:
On Wed, 07 Jun 2017 02:38:05 +0200, Takashi Sakamoto wrote:
In recent commit for ALSA PCM core, some arrangement is done for 'struct snd_pcm_ops.ack' callback. This is called when appl_ptr is explicitly moved in intermediate buffer for PCM frames, except for some cases described later.
For drivers in ALSA firewire stack, usage of this callback has a merit to reduce latency between time of PCM frame queueing and handling actual packets in recent isochronous cycle, because no need to wait for software IRQ context from isochronous context of OHCI 1394.
If this works well in a case that mapped page frame is used for the intermediate buffer, user process should execute some commands for ioctl(2) to tell the number of handled PCM frames in the intermediate buffer just after handling them.
This is one thing that was raised in the discussion with Intel people, and my suggestion was to add a new flag to suppress the status/control mmap like pcm_file->no_compat_mmap. Then alsa-lib falls back to the sync_ptr ioctl, and the driver can catch each appl_ptr update.
Now I considered this again, and concluded that a simple patch like below should suffice.
Adding Intel people to Cc, who raised the issue originally.
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: pcm: Suppress status/control mmap when ack ops is present
The drivers using PCM ack ops require the notification whenever appl_ptr is updated in general. But when the PCM status/control page is mmapped, this notification doesn't happen, per design, thus it's not guaranteed to receive the fine-grained updates.
For improving the situation, this patch simply suppresses the PCM status/control mmap when ack ops is defined. At least, for all existing drivers with ack, this should give more benefit.
Once when we really need the full optimization with status/control mmap even using ack ops, we may reconsider the check, e.g. introducing a new flag. But, so far, this should be good enough.
Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/core/pcm_native.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index 2bde07a4a87f..b993b0420411 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -3189,6 +3189,10 @@ static int snd_pcm_mmap_status(struct snd_pcm_substream *substream, struct file struct vm_area_struct *area) { long size; + + /* suppress status/control mmap when driver requires ack */ + if (substream->ops->ack) + return -ENXIO; if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start; @@ -3225,6 +3229,10 @@ static int snd_pcm_mmap_control(struct snd_pcm_substream *substream, struct file struct vm_area_struct *area) { long size; + + /* suppress status/control mmap when driver requires ack */ + if (substream->ops->ack) + return -ENXIO; if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start;
On Wed, Jun 07, 2017 at 11:20:03PM +0200, Takashi Iwai wrote:
On Wed, 07 Jun 2017 07:59:20 +0200, Takashi Iwai wrote:
On Wed, 07 Jun 2017 02:38:05 +0200, Takashi Sakamoto wrote:
In recent commit for ALSA PCM core, some arrangement is done for 'struct snd_pcm_ops.ack' callback. This is called when appl_ptr is explicitly moved in intermediate buffer for PCM frames, except for some cases described later.
For drivers in ALSA firewire stack, usage of this callback has a merit to reduce latency between time of PCM frame queueing and handling actual packets in recent isochronous cycle, because no need to wait for software IRQ context from isochronous context of OHCI 1394.
If this works well in a case that mapped page frame is used for the intermediate buffer, user process should execute some commands for ioctl(2) to tell the number of handled PCM frames in the intermediate buffer just after handling them.
This is one thing that was raised in the discussion with Intel people, and my suggestion was to add a new flag to suppress the status/control mmap like pcm_file->no_compat_mmap. Then alsa-lib falls back to the sync_ptr ioctl, and the driver can catch each appl_ptr update.
Now I considered this again, and concluded that a simple patch like below should suffice.
Adding Intel people to Cc, who raised the issue originally.
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: pcm: Suppress status/control mmap when ack ops is present
The drivers using PCM ack ops require the notification whenever appl_ptr is updated in general. But when the PCM status/control page is mmapped, this notification doesn't happen, per design, thus it's not guaranteed to receive the fine-grained updates.
For improving the situation, this patch simply suppresses the PCM status/control mmap when ack ops is defined. At least, for all existing drivers with ack, this should give more benefit.
Once when we really need the full optimization with status/control mmap even using ack ops, we may reconsider the check, e.g. introducing a new flag. But, so far, this should be good enough.
Yes this makes sense and we tested it for us, looks good
Reveiwed-by: Vinod Koul vinod.koul@intel.com Tested-by: Subhransu S. Prusty subhransu.s.prusty@intel.com
Signed-off-by: Takashi Iwai tiwai@suse.de
sound/core/pcm_native.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index 2bde07a4a87f..b993b0420411 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -3189,6 +3189,10 @@ static int snd_pcm_mmap_status(struct snd_pcm_substream *substream, struct file struct vm_area_struct *area) { long size;
- /* suppress status/control mmap when driver requires ack */
- if (substream->ops->ack)
if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start;return -ENXIO;
@@ -3225,6 +3229,10 @@ static int snd_pcm_mmap_control(struct snd_pcm_substream *substream, struct file struct vm_area_struct *area) { long size;
- /* suppress status/control mmap when driver requires ack */
- if (substream->ops->ack)
if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start;return -ENXIO;
-- 2.13.0
On Fri, 09 Jun 2017 05:42:31 +0200, Vinod Koul wrote:
On Wed, Jun 07, 2017 at 11:20:03PM +0200, Takashi Iwai wrote:
On Wed, 07 Jun 2017 07:59:20 +0200, Takashi Iwai wrote:
On Wed, 07 Jun 2017 02:38:05 +0200, Takashi Sakamoto wrote:
In recent commit for ALSA PCM core, some arrangement is done for 'struct snd_pcm_ops.ack' callback. This is called when appl_ptr is explicitly moved in intermediate buffer for PCM frames, except for some cases described later.
For drivers in ALSA firewire stack, usage of this callback has a merit to reduce latency between time of PCM frame queueing and handling actual packets in recent isochronous cycle, because no need to wait for software IRQ context from isochronous context of OHCI 1394.
If this works well in a case that mapped page frame is used for the intermediate buffer, user process should execute some commands for ioctl(2) to tell the number of handled PCM frames in the intermediate buffer just after handling them.
This is one thing that was raised in the discussion with Intel people, and my suggestion was to add a new flag to suppress the status/control mmap like pcm_file->no_compat_mmap. Then alsa-lib falls back to the sync_ptr ioctl, and the driver can catch each appl_ptr update.
Now I considered this again, and concluded that a simple patch like below should suffice.
Adding Intel people to Cc, who raised the issue originally.
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: pcm: Suppress status/control mmap when ack ops is present
The drivers using PCM ack ops require the notification whenever appl_ptr is updated in general. But when the PCM status/control page is mmapped, this notification doesn't happen, per design, thus it's not guaranteed to receive the fine-grained updates.
For improving the situation, this patch simply suppresses the PCM status/control mmap when ack ops is defined. At least, for all existing drivers with ack, this should give more benefit.
Once when we really need the full optimization with status/control mmap even using ack ops, we may reconsider the check, e.g. introducing a new flag. But, so far, this should be good enough.
Yes this makes sense and we tested it for us, looks good
Reveiwed-by: Vinod Koul vinod.koul@intel.com Tested-by: Subhransu S. Prusty subhransu.s.prusty@intel.com
OK, thanks.
If Sakamato-san is happy with this change, I'm going to merge it for 4.13.
Takashi
Signed-off-by: Takashi Iwai tiwai@suse.de
sound/core/pcm_native.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index 2bde07a4a87f..b993b0420411 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -3189,6 +3189,10 @@ static int snd_pcm_mmap_status(struct snd_pcm_substream *substream, struct file struct vm_area_struct *area) { long size;
- /* suppress status/control mmap when driver requires ack */
- if (substream->ops->ack)
if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start;return -ENXIO;
@@ -3225,6 +3229,10 @@ static int snd_pcm_mmap_control(struct snd_pcm_substream *substream, struct file struct vm_area_struct *area) { long size;
- /* suppress status/control mmap when driver requires ack */
- if (substream->ops->ack)
if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start;return -ENXIO;
-- 2.13.0
-- ~Vinod
On Jun 9 2017 15:53, Takashi Iwai wrote:
From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: pcm: Suppress status/control mmap when ack ops is present
The drivers using PCM ack ops require the notification whenever appl_ptr is updated in general. But when the PCM status/control page is mmapped, this notification doesn't happen, per design, thus it's not guaranteed to receive the fine-grained updates.
For improving the situation, this patch simply suppresses the PCM status/control mmap when ack ops is defined. At least, for all existing drivers with ack, this should give more benefit.
Once when we really need the full optimization with status/control mmap even using ack ops, we may reconsider the check, e.g. introducing a new flag. But, so far, this should be good enough.
Yes this makes sense and we tested it for us, looks good
Reveiwed-by: Vinod Koul vinod.koul@intel.com Tested-by: Subhransu S. Prusty subhransu.s.prusty@intel.com
OK, thanks.
If Sakamato-san is happy with this change, I'm going to merge it for 4.13.
I'm writing a long long message about my concern for this patch. I'm happy if you postpone application to your for-next branch, till next week.
Thanks
Takashi Sakamoto
On Fri, 09 Jun 2017 09:01:56 +0200, Takashi Sakamoto wrote:
On Jun 9 2017 15:53, Takashi Iwai wrote:
From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: pcm: Suppress status/control mmap when ack ops is present
The drivers using PCM ack ops require the notification whenever appl_ptr is updated in general. But when the PCM status/control page is mmapped, this notification doesn't happen, per design, thus it's not guaranteed to receive the fine-grained updates.
For improving the situation, this patch simply suppresses the PCM status/control mmap when ack ops is defined. At least, for all existing drivers with ack, this should give more benefit.
Once when we really need the full optimization with status/control mmap even using ack ops, we may reconsider the check, e.g. introducing a new flag. But, so far, this should be good enough.
Yes this makes sense and we tested it for us, looks good
Reveiwed-by: Vinod Koul vinod.koul@intel.com Tested-by: Subhransu S. Prusty subhransu.s.prusty@intel.com
OK, thanks.
If Sakamato-san is happy with this change, I'm going to merge it for 4.13.
I'm writing a long long message about my concern for this patch.
Better to shorten, if you want to convince someone :)
I'm happy if you postpone application to your for-next branch, till next week.
OK.
Takashi
Hi,
On Jun 8 2017 06:20, Takashi Iwai wrote:
From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: pcm: Suppress status/control mmap when ack ops is present
The drivers using PCM ack ops require the notification whenever appl_ptr is updated in general. But when the PCM status/control page is mmapped, this notification doesn't happen, per design, thus it's not guaranteed to receive the fine-grained updates.
For improving the situation, this patch simply suppresses the PCM status/control mmap when ack ops is defined. At least, for all existing drivers with ack, this should give more benefit.
Once when we really need the full optimization with status/control mmap even using ack ops, we may reconsider the check, e.g. introducing a new flag. But, so far, this should be good enough.
Signed-off-by: Takashi Iwai tiwai@suse.de
sound/core/pcm_native.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index 2bde07a4a87f..b993b0420411 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -3189,6 +3189,10 @@ static int snd_pcm_mmap_status(struct snd_pcm_substream *substream, struct file struct vm_area_struct *area) { long size;
- /* suppress status/control mmap when driver requires ack */
- if (substream->ops->ack)
if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start;return -ENXIO;
@@ -3225,6 +3229,10 @@ static int snd_pcm_mmap_control(struct snd_pcm_substream *substream, struct file struct vm_area_struct *area) { long size;
- /* suppress status/control mmap when driver requires ack */
- if (substream->ops->ack)
if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start;return -ENXIO;
Let me evaluate this patch in a point of overhead at kernel/userspace interaction.
When considering about shape of ALSA PCM applications, I can show it by below simple pseudo code:
``` while: poll(2) calculate how many PCM frames are available. memcpy(2) ```
To satisfy requirements of some drivers, we need to find a way to take userspace applications to commit the number of handled PCM frames to kernel space after the memcpy(2). For this purpose, in ALSA PCM interface, SNDRV_PCM_IOCTL_SYNC_PTR is available.
``` while: poll(2) calculate how many PCM frames are available. memcpy(2) ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR ```
Well, as an actual solution, this patch disallows userspace applications to map page frame for the status/control data of PCM substream, because alsa-lib has fallback mode for this case to utilize SNDRV_PCM_IOCTL_SYNC_PTR. This is a similar situation to ARM platform. As I described in a commit fccf53881e9b ("ALSA: pcm: add 'applptr' event of tracepoint"), ALSA PCM core doesn't perform page frame mapping for the status/control data[1].
In the commit message, I compared tracing data of tracepoints and log of strace. In the case, the number of calls for ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR in a loop with a call of poll(2) is seven. One of the sevel calls is actually used to commit the number of PCM frames. There's 6 extra system calls. These 6 calls are just used to check current status of runtime of PCM substream, and on cache coherent architecture, they can be suppressed by the page frame mapping. The poll loop repeats during runtime, thus it's important to analyze overhead of the extra system calls.
I suggest 3 points to evaluate this patch.
1. CPU time for the system call 2. disabling kernel preemption and local IRQ 3. backward compatibility
Here, I describe my opinion to this patch for each of these points.
1. CPU time for the system call System calls cost expensive than usual API calls. As long as investigated on x86_64 architecture on perf-stat, additional 6 call of ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR increases 5,000-15,000 instructions in program execution. Recent hardware is enough fast. It might not be so large disadvantage.
However, when drivers implement 'struct snd_pcm_ops.ack' callback, I have different opinion; it's too frequent. In either drivers with indirect layer and packet-oriented drivers, the callback is used to process PCM frames in intermediate buffer for userspace. The frequent call of the callback is not enough effective because the applications is programmed with a manner to be accurate for actual time somehow, thus they don't process more PCM frames in one poll loop.
2. disabling kernel preemption and local IRQ In kernel land, service routine for SNDRV_PCM_IOCTL_SYNC_PTR partly runs with acquired PCM stream lock, which disables kernel preemption and local IRQ usually for running CPU. This lock is required because the PCM stream is handled in any handler for hardware events, but this situation can reduce realtime performance because no process or service routine for interrupt can't use the CPU. If we have the other options, it's worth to reconsider.
Intel SST-based drivers use nonatomic flag for the lock. In this case, kernel preemption is enabled but IRQ is disabled. I think Intel developers work is for their embedded solution. The care of realtime performance might be good for their work, too.
3. backward compatibility For backward compatibility, this patch is a simple solution. Even if patched kernel runs with old version of relevant stuffs, tinyalsa or alsa-lib, things work well as expected.
However, without this patch, things still work well. In either drivers with indirect layer or packet-oriented drivers, queued PCM frames are transferred according to any hardware events. For such drivers, commit notification from applications just expands their probability to improve performance; e.g. latency between queueing and transmission.
I think it better to back to your initial direction; adding info flag, let stuffs in userspace parse it, perform the commit for the number of handled PCM frames in 'snd_pcm_hw_mmap_commit()'.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=f...
Regards
Takashi Sakamoto
On Tue, 13 Jun 2017 00:49:37 +0200, Takashi Sakamoto wrote:
Hi,
On Jun 8 2017 06:20, Takashi Iwai wrote:
From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: pcm: Suppress status/control mmap when ack ops is present
The drivers using PCM ack ops require the notification whenever appl_ptr is updated in general. But when the PCM status/control page is mmapped, this notification doesn't happen, per design, thus it's not guaranteed to receive the fine-grained updates.
For improving the situation, this patch simply suppresses the PCM status/control mmap when ack ops is defined. At least, for all existing drivers with ack, this should give more benefit.
Once when we really need the full optimization with status/control mmap even using ack ops, we may reconsider the check, e.g. introducing a new flag. But, so far, this should be good enough.
Signed-off-by: Takashi Iwai tiwai@suse.de
sound/core/pcm_native.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index 2bde07a4a87f..b993b0420411 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -3189,6 +3189,10 @@ static int snd_pcm_mmap_status(struct snd_pcm_substream *substream, struct file struct vm_area_struct *area) { long size;
- /* suppress status/control mmap when driver requires ack */
- if (substream->ops->ack)
if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start;return -ENXIO;
@@ -3225,6 +3229,10 @@ static int snd_pcm_mmap_control(struct snd_pcm_substream *substream, struct file struct vm_area_struct *area) { long size;
- /* suppress status/control mmap when driver requires ack */
- if (substream->ops->ack)
if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start;return -ENXIO;
Let me evaluate this patch in a point of overhead at kernel/userspace interaction.
When considering about shape of ALSA PCM applications, I can show it by below simple pseudo code:
while: poll(2) calculate how many PCM frames are available. memcpy(2)
To satisfy requirements of some drivers, we need to find a way to take userspace applications to commit the number of handled PCM frames to kernel space after the memcpy(2). For this purpose, in ALSA PCM interface, SNDRV_PCM_IOCTL_SYNC_PTR is available.
while: poll(2) calculate how many PCM frames are available. memcpy(2) ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR
Well, as an actual solution, this patch disallows userspace applications to map page frame for the status/control data of PCM substream, because alsa-lib has fallback mode for this case to utilize SNDRV_PCM_IOCTL_SYNC_PTR. This is a similar situation to ARM platform. As I described in a commit fccf53881e9b ("ALSA: pcm: add 'applptr' event of tracepoint"), ALSA PCM core doesn't perform page frame mapping for the status/control data[1].
In the commit message, I compared tracing data of tracepoints and log of strace. In the case, the number of calls for ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR in a loop with a call of poll(2) is seven. One of the sevel calls is actually used to commit the number of PCM frames. There's 6 extra system calls. These 6 calls are just used to check current status of runtime of PCM substream, and on cache coherent architecture, they can be suppressed by the page frame mapping. The poll loop repeats during runtime, thus it's important to analyze overhead of the extra system calls.
I suggest 3 points to evaluate this patch.
- CPU time for the system call
- disabling kernel preemption and local IRQ
- backward compatibility
Here, I describe my opinion to this patch for each of these points.
- CPU time for the system call
System calls cost expensive than usual API calls. As long as investigated on x86_64 architecture on perf-stat, additional 6 call of ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR increases 5,000-15,000 instructions in program execution. Recent hardware is enough fast. It might not be so large disadvantage.
However, when drivers implement 'struct snd_pcm_ops.ack' callback, I have different opinion; it's too frequent. In either drivers with indirect layer and packet-oriented drivers, the callback is used to process PCM frames in intermediate buffer for userspace. The frequent call of the callback is not enough effective because the applications is programmed with a manner to be accurate for actual time somehow, thus they don't process more PCM frames in one poll loop.
- disabling kernel preemption and local IRQ
In kernel land, service routine for SNDRV_PCM_IOCTL_SYNC_PTR partly runs with acquired PCM stream lock, which disables kernel preemption and local IRQ usually for running CPU. This lock is required because the PCM stream is handled in any handler for hardware events, but this situation can reduce realtime performance because no process or service routine for interrupt can't use the CPU. If we have the other options, it's worth to reconsider.
Intel SST-based drivers use nonatomic flag for the lock. In this case, kernel preemption is enabled but IRQ is disabled. I think Intel developers work is for their embedded solution. The care of realtime performance might be good for their work, too.
- backward compatibility
For backward compatibility, this patch is a simple solution. Even if patched kernel runs with old version of relevant stuffs, tinyalsa or alsa-lib, things work well as expected.
However, without this patch, things still work well. In either drivers with indirect layer or packet-oriented drivers, queued PCM frames are transferred according to any hardware events. For such drivers, commit notification from applications just expands their probability to improve performance; e.g. latency between queueing and transmission.
I think it better to back to your initial direction; adding info flag, let stuffs in userspace parse it, perform the commit for the number of handled PCM frames in 'snd_pcm_hw_mmap_commit()'.
Thanks for the analysis. Yes, the cost by the explicit calls is known, and it's what was mentioned in the commit log as a further optimization. I bought this cost as good enough for better appl_ptr accuracy, but you thought differently on that.
One thing to be noted is that user-space doesn't have to call sync_ptr at all, and even no period IRQ is triggered depending on the setup (e.g. PA prefers it). This is the case we want to solve. That is, the situation is worse than you thought -- things don't work as expected unless we enforce the sync_ptr notification from user-space.
Then, the question is how to enforce it. The easiest option is to disable status/control mmap. That's how the patch was developed. As an option, 1) Selectively disable mmap by a new flag, or 2) Selectively disable mmap by the presence of ack callback.
And (2) seems too aggressively applied, from your opinion.
Now you might thing another option: 3) Add a new PCM info flag and let alsa-lib behaving differently according to it
But this is no-go, since it doesn't work with the old alsa-lib.
So, IMO, we need to go back to (1), which is my original patch, as a start. It affects only the driver that sets (in this case, it's SKL+ driver) and it works with the old user-space, at least.
Then we can improve the performance in alsa-lib. alsa-lib has some inefficient implementation regarding the hwptr/appl_ptr update. In may places, it calls hwsync, then do avail_update that again calls sync_ptr, for example. I guess we can halves the amount of sync_ptr calls by optimizing that.
Since the sync_ptr is used for all non-x86 architectures (i.e. nowadays majority of devices in the market), the optimization is a good benefit for all. Worth to try, in anyway.
And yet another obvious optimization would be to allow only the status mmap and disallow the control mmap. Currently, both are paired and the control mmap can't fail if the status mmap succeeds. But, for the case like this, we want to suppress only the control mmap.
Unfortunately, changing this behavior requires both alsa-lib and kernel codes. And keeping the behavior compatibility, we'd need to introduce something new, e.g. an ioctl to set the compatible protocol version from alsa-lib. For now, alsa-lib inquires the protocol version the kernel supports (SNDRV_PCM_IOCTL_PVERSION). The newly required one is the other way round, alsa-lib telling the supported protocol version to kernel. Then the kernel can know whether this alsa-lib version accepts the status-only mmap, and gracefully returns -ENXIO for the control mmap.
thanks,
Takashi
On Jun 13 2017 21:03, Takashi Iwai wrote:
Thanks for the analysis. Yes, the cost by the explicit calls is known, and it's what was mentioned in the commit log as a further optimization. I bought this cost as good enough for better appl_ptr accuracy, but you thought differently on that.
One thing to be noted is that user-space doesn't have to call sync_ptr at all, and even no period IRQ is triggered depending on the setup (e.g. PA prefers it). This is the case we want to solve. That is, the situation is worse than you thought -- things don't work as expected unless we enforce the sync_ptr notification from user-space.
Then, the question is how to enforce it. The easiest option is to disable status/control mmap. That's how the patch was developed. As an option,
- Selectively disable mmap by a new flag, or
- Selectively disable mmap by the presence of ack callback.
And (2) seems too aggressively applied, from your opinion.
Now you might thing another option: 3) Add a new PCM info flag and let alsa-lib behaving differently according to it
But this is no-go, since it doesn't work with the old alsa-lib.
So, IMO, we need to go back to (1), which is my original patch, as a start. It affects only the driver that sets (in this case, it's SKL+ driver) and it works with the old user-space, at least.
Then we can improve the performance in alsa-lib. alsa-lib has some inefficient implementation regarding the hwptr/appl_ptr update. In may places, it calls hwsync, then do avail_update that again calls sync_ptr, for example. I guess we can halves the amount of sync_ptr calls by optimizing that.
Since the sync_ptr is used for all non-x86 architectures (i.e. nowadays majority of devices in the market), the optimization is a good benefit for all. Worth to try, in anyway.
And yet another obvious optimization would be to allow only the status mmap and disallow the control mmap. Currently, both are paired and the control mmap can't fail if the status mmap succeeds. But, for the case like this, we want to suppress only the control mmap.
Unfortunately, changing this behavior requires both alsa-lib and kernel codes. And keeping the behavior compatibility, we'd need to introduce something new, e.g. an ioctl to set the compatible protocol version from alsa-lib. For now, alsa-lib inquires the protocol version the kernel supports (SNDRV_PCM_IOCTL_PVERSION). The newly required one is the other way round, alsa-lib telling the supported protocol version to kernel. Then the kernel can know whether this alsa-lib version accepts the status-only mmap, and gracefully returns -ENXIO for the control mmap.
Hm, I have no objections to the changes of both kernel/userspace, but from different reasons. Therefore I have different solution for this issue.
I recognize this issue as a change of programming model for new design of devices. (Advantages for drivers for audio and music units on IEEE 1394 bus and the others is its sub-effect, a minor bonus.)
Current ALSA PCM interface is designed based on an idea for data transmission; hardware is unaware of how many PCM frames are already queued or dequeued on PCM buffer in system memory. Hardware can transfer PCM frames to a part of the buffer with un-dequeued PCM frames (at capture) or from a part of the buffer without enough queued PCM frames (at playback). This design doesn't require kernel stuffs to maintain the application-side position on PCM buffer.
If my understanding is correct, Intel's recent hardware can aware of the application-side pointer and maintain the data transmission, according to relative position of the application-side and the hardware-side on the PCM buffer. As Pierre-Louis said, this could satisfy Intel's convenience; e.g. reduce power comsumption. I guess that it can decrease frequency to drive hardware for the data transmission, or do the data transmission as better timing.
This model of data transmission is new in this subsystem. I think it reasonable to add enough stuffs in both update kernel/userspace and update version of the interface for this design.
A several years ago, no-period-wakeup programming model was introduced, and this subsystem got large changes for both kernel/user land. I believe this issue also has the similar impact. In my taste, in this case, it's better to give up to keep full backward compatibility, and renew related stuffs. When working with old userspace stuffs, drivers run with old mode (namely, run for current interface). The difference of interface version is absorbed in alsa-lib as much as possible, then application runs without large changes.
Regards
Takashi Sakamoto
On Wed, 14 Jun 2017 16:34:32 +0200, Takashi Sakamoto wrote:
On Jun 13 2017 21:03, Takashi Iwai wrote:
Thanks for the analysis. Yes, the cost by the explicit calls is known, and it's what was mentioned in the commit log as a further optimization. I bought this cost as good enough for better appl_ptr accuracy, but you thought differently on that.
One thing to be noted is that user-space doesn't have to call sync_ptr at all, and even no period IRQ is triggered depending on the setup (e.g. PA prefers it). This is the case we want to solve. That is, the situation is worse than you thought -- things don't work as expected unless we enforce the sync_ptr notification from user-space.
Then, the question is how to enforce it. The easiest option is to disable status/control mmap. That's how the patch was developed. As an option,
- Selectively disable mmap by a new flag, or
- Selectively disable mmap by the presence of ack callback.
And (2) seems too aggressively applied, from your opinion.
Now you might thing another option: 3) Add a new PCM info flag and let alsa-lib behaving differently according to it
But this is no-go, since it doesn't work with the old alsa-lib.
So, IMO, we need to go back to (1), which is my original patch, as a start. It affects only the driver that sets (in this case, it's SKL+ driver) and it works with the old user-space, at least.
Then we can improve the performance in alsa-lib. alsa-lib has some inefficient implementation regarding the hwptr/appl_ptr update. In may places, it calls hwsync, then do avail_update that again calls sync_ptr, for example. I guess we can halves the amount of sync_ptr calls by optimizing that.
Since the sync_ptr is used for all non-x86 architectures (i.e. nowadays majority of devices in the market), the optimization is a good benefit for all. Worth to try, in anyway.
And yet another obvious optimization would be to allow only the status mmap and disallow the control mmap. Currently, both are paired and the control mmap can't fail if the status mmap succeeds. But, for the case like this, we want to suppress only the control mmap.
Unfortunately, changing this behavior requires both alsa-lib and kernel codes. And keeping the behavior compatibility, we'd need to introduce something new, e.g. an ioctl to set the compatible protocol version from alsa-lib. For now, alsa-lib inquires the protocol version the kernel supports (SNDRV_PCM_IOCTL_PVERSION). The newly required one is the other way round, alsa-lib telling the supported protocol version to kernel. Then the kernel can know whether this alsa-lib version accepts the status-only mmap, and gracefully returns -ENXIO for the control mmap.
Hm, I have no objections to the changes of both kernel/userspace, but from different reasons. Therefore I have different solution for this issue.
I recognize this issue as a change of programming model for new design of devices. (Advantages for drivers for audio and music units on IEEE 1394 bus and the others is its sub-effect, a minor bonus.)
Current ALSA PCM interface is designed based on an idea for data transmission; hardware is unaware of how many PCM frames are already queued or dequeued on PCM buffer in system memory. Hardware can transfer PCM frames to a part of the buffer with un-dequeued PCM frames (at capture) or from a part of the buffer without enough queued PCM frames (at playback). This design doesn't require kernel stuffs to maintain the application-side position on PCM buffer.
If my understanding is correct, Intel's recent hardware can aware of the application-side pointer and maintain the data transmission, according to relative position of the application-side and the hardware-side on the PCM buffer. As Pierre-Louis said, this could satisfy Intel's convenience; e.g. reduce power comsumption. I guess that it can decrease frequency to drive hardware for the data transmission, or do the data transmission as better timing.
This model of data transmission is new in this subsystem. I think it reasonable to add enough stuffs in both update kernel/userspace and update version of the interface for this design.
A several years ago, no-period-wakeup programming model was introduced, and this subsystem got large changes for both kernel/user land. I believe this issue also has the similar impact. In my taste, in this case, it's better to give up to keep full backward compatibility, and renew related stuffs. When working with old userspace stuffs, drivers run with old mode (namely, run for current interface). The difference of interface version is absorbed in alsa-lib as much as possible, then application runs without large changes.
Well... I guess you're a bit overreacting to it. There is no new model in a big picture. The appl_ptr has been always present, and it should be referred at each moment it's updated. That said, the problem is purely in the kernel side implementation -- or more exactly to say, it's about the kernel / user-space ABI.
The required change would break the ABI the current alsa-lib expects, thus we need to update and enable the new ABI, conditionally for the newer alsa-lib for an optimized path. For the older alsa-lib, we keep the old ABI, a bit less optimized, but still works well enough. That's all.
thanks,
Takashi
On 2017年06月14日 23:52, Takashi Iwai wrote:
On Wed, 14 Jun 2017 16:34:32 +0200, Takashi Sakamoto wrote:
On Jun 13 2017 21:03, Takashi Iwai wrote:
Thanks for the analysis. Yes, the cost by the explicit calls is known, and it's what was mentioned in the commit log as a further optimization. I bought this cost as good enough for better appl_ptr accuracy, but you thought differently on that.
One thing to be noted is that user-space doesn't have to call sync_ptr at all, and even no period IRQ is triggered depending on the setup (e.g. PA prefers it). This is the case we want to solve. That is, the situation is worse than you thought -- things don't work as expected unless we enforce the sync_ptr notification from user-space.
Then, the question is how to enforce it. The easiest option is to disable status/control mmap. That's how the patch was developed. As an option,
- Selectively disable mmap by a new flag, or
- Selectively disable mmap by the presence of ack callback.
And (2) seems too aggressively applied, from your opinion.
Now you might thing another option: 3) Add a new PCM info flag and let alsa-lib behaving differently according to it
But this is no-go, since it doesn't work with the old alsa-lib.
So, IMO, we need to go back to (1), which is my original patch, as a start. It affects only the driver that sets (in this case, it's SKL+ driver) and it works with the old user-space, at least.
Then we can improve the performance in alsa-lib. alsa-lib has some inefficient implementation regarding the hwptr/appl_ptr update. In may places, it calls hwsync, then do avail_update that again calls sync_ptr, for example. I guess we can halves the amount of sync_ptr calls by optimizing that.
Since the sync_ptr is used for all non-x86 architectures (i.e. nowadays majority of devices in the market), the optimization is a good benefit for all. Worth to try, in anyway.
And yet another obvious optimization would be to allow only the status mmap and disallow the control mmap. Currently, both are paired and the control mmap can't fail if the status mmap succeeds. But, for the case like this, we want to suppress only the control mmap.
Unfortunately, changing this behavior requires both alsa-lib and kernel codes. And keeping the behavior compatibility, we'd need to introduce something new, e.g. an ioctl to set the compatible protocol version from alsa-lib. For now, alsa-lib inquires the protocol version the kernel supports (SNDRV_PCM_IOCTL_PVERSION). The newly required one is the other way round, alsa-lib telling the supported protocol version to kernel. Then the kernel can know whether this alsa-lib version accepts the status-only mmap, and gracefully returns -ENXIO for the control mmap.
Hm, I have no objections to the changes of both kernel/userspace, but from different reasons. Therefore I have different solution for this issue.
I recognize this issue as a change of programming model for new design of devices. (Advantages for drivers for audio and music units on IEEE 1394 bus and the others is its sub-effect, a minor bonus.)
Current ALSA PCM interface is designed based on an idea for data transmission; hardware is unaware of how many PCM frames are already queued or dequeued on PCM buffer in system memory. Hardware can transfer PCM frames to a part of the buffer with un-dequeued PCM frames (at capture) or from a part of the buffer without enough queued PCM frames (at playback). This design doesn't require kernel stuffs to maintain the application-side position on PCM buffer.
If my understanding is correct, Intel's recent hardware can aware of the application-side pointer and maintain the data transmission, according to relative position of the application-side and the hardware-side on the PCM buffer. As Pierre-Louis said, this could satisfy Intel's convenience; e.g. reduce power comsumption. I guess that it can decrease frequency to drive hardware for the data transmission, or do the data transmission as better timing.
This model of data transmission is new in this subsystem. I think it reasonable to add enough stuffs in both update kernel/userspace and update version of the interface for this design.
A several years ago, no-period-wakeup programming model was introduced, and this subsystem got large changes for both kernel/user land. I believe this issue also has the similar impact. In my taste, in this case, it's better to give up to keep full backward compatibility, and renew related stuffs. When working with old userspace stuffs, drivers run with old mode (namely, run for current interface). The difference of interface version is absorbed in alsa-lib as much as possible, then application runs without large changes.
Well... I guess you're a bit overreacting to it. There is no new model in a big picture.
I suggested to the idea to change kernel/userspace interface itself, not to your raw idea. In short, I'm still against the idea to disable mmap(2) for status/control data of PCM runtime.
As I declare in my former message[1], there's an apparent change about programming model, independent on architecture support for cache coherency. As I declare in my previous message[2], the way for data transmission on the new hardware is not expected in current ALSA implementation. This should be described in new version of interface (2.0.14).
The appl_ptr has been always present, and it should be referred at each moment it's updated.
Actually it's not maintained in kernel space. Userspace applications do it, at least for operations under mmap(2) of PCM buffer. Clearly, current application-side position on the buffer is not cared in any service routine to handle hardware event.
As of 2017, we have some userspace implementation as consumers of ALSA PCM interface. If we add any change into the interface, we're responsible for notification of it via version change.
That said, the problem is purely in the kernel side implementation -- or more exactly to say, it's about the kernel / user-space ABI.
Yes. We need to add changes into the way to use kernel/userspace interface for applications in this case.
The required change would break the ABI the current alsa-lib expects, thus we need to update and enable the new ABI, conditionally for the newer alsa-lib for an optimized path. For the older alsa-lib, we keep the old ABI, a bit less optimized, but still works well enough. That's all.
As I described, I have few interests in your raw idea. It's too early in this discussion.
[1] [alsa-devel] [PATCH 1/2] ALSA: firewire: process packets in 'struct snd_pcm_ops.ack' callback http://mailman.alsa-project.org/pipermail/alsa-devel/2017-June/121699.html [2] [alsa-devel] [PATCH 1/2] ALSA: firewire: process packets in 'struct snd_pcm_ops.ack' callback http://mailman.alsa-project.org/pipermail/alsa-devel/2017-June/121794.html
Regards
Takashi Sakamoto
On Thu, 15 Jun 2017 04:32:28 +0200, Takashi Sakamoto wrote:
On 2017年06月14日 23:52, Takashi Iwai wrote:
On Wed, 14 Jun 2017 16:34:32 +0200, Takashi Sakamoto wrote:
On Jun 13 2017 21:03, Takashi Iwai wrote:
Thanks for the analysis. Yes, the cost by the explicit calls is known, and it's what was mentioned in the commit log as a further optimization. I bought this cost as good enough for better appl_ptr accuracy, but you thought differently on that.
One thing to be noted is that user-space doesn't have to call sync_ptr at all, and even no period IRQ is triggered depending on the setup (e.g. PA prefers it). This is the case we want to solve. That is, the situation is worse than you thought -- things don't work as expected unless we enforce the sync_ptr notification from user-space.
Then, the question is how to enforce it. The easiest option is to disable status/control mmap. That's how the patch was developed. As an option,
- Selectively disable mmap by a new flag, or
- Selectively disable mmap by the presence of ack callback.
And (2) seems too aggressively applied, from your opinion.
Now you might thing another option: 3) Add a new PCM info flag and let alsa-lib behaving differently according to it
But this is no-go, since it doesn't work with the old alsa-lib.
So, IMO, we need to go back to (1), which is my original patch, as a start. It affects only the driver that sets (in this case, it's SKL+ driver) and it works with the old user-space, at least.
Then we can improve the performance in alsa-lib. alsa-lib has some inefficient implementation regarding the hwptr/appl_ptr update. In may places, it calls hwsync, then do avail_update that again calls sync_ptr, for example. I guess we can halves the amount of sync_ptr calls by optimizing that.
Since the sync_ptr is used for all non-x86 architectures (i.e. nowadays majority of devices in the market), the optimization is a good benefit for all. Worth to try, in anyway.
And yet another obvious optimization would be to allow only the status mmap and disallow the control mmap. Currently, both are paired and the control mmap can't fail if the status mmap succeeds. But, for the case like this, we want to suppress only the control mmap.
Unfortunately, changing this behavior requires both alsa-lib and kernel codes. And keeping the behavior compatibility, we'd need to introduce something new, e.g. an ioctl to set the compatible protocol version from alsa-lib. For now, alsa-lib inquires the protocol version the kernel supports (SNDRV_PCM_IOCTL_PVERSION). The newly required one is the other way round, alsa-lib telling the supported protocol version to kernel. Then the kernel can know whether this alsa-lib version accepts the status-only mmap, and gracefully returns -ENXIO for the control mmap.
Hm, I have no objections to the changes of both kernel/userspace, but from different reasons. Therefore I have different solution for this issue.
I recognize this issue as a change of programming model for new design of devices. (Advantages for drivers for audio and music units on IEEE 1394 bus and the others is its sub-effect, a minor bonus.)
Current ALSA PCM interface is designed based on an idea for data transmission; hardware is unaware of how many PCM frames are already queued or dequeued on PCM buffer in system memory. Hardware can transfer PCM frames to a part of the buffer with un-dequeued PCM frames (at capture) or from a part of the buffer without enough queued PCM frames (at playback). This design doesn't require kernel stuffs to maintain the application-side position on PCM buffer.
If my understanding is correct, Intel's recent hardware can aware of the application-side pointer and maintain the data transmission, according to relative position of the application-side and the hardware-side on the PCM buffer. As Pierre-Louis said, this could satisfy Intel's convenience; e.g. reduce power comsumption. I guess that it can decrease frequency to drive hardware for the data transmission, or do the data transmission as better timing.
This model of data transmission is new in this subsystem. I think it reasonable to add enough stuffs in both update kernel/userspace and update version of the interface for this design.
A several years ago, no-period-wakeup programming model was introduced, and this subsystem got large changes for both kernel/user land. I believe this issue also has the similar impact. In my taste, in this case, it's better to give up to keep full backward compatibility, and renew related stuffs. When working with old userspace stuffs, drivers run with old mode (namely, run for current interface). The difference of interface version is absorbed in alsa-lib as much as possible, then application runs without large changes.
Well... I guess you're a bit overreacting to it. There is no new model in a big picture.
I suggested to the idea to change kernel/userspace interface itself, not to your raw idea. In short, I'm still against the idea to disable mmap(2) for status/control data of PCM runtime.
As I declare in my former message[1], there's an apparent change about programming model, independent on architecture support for cache coherency. As I declare in my previous message[2], the way for data transmission on the new hardware is not expected in current ALSA implementation. This should be described in new version of interface (2.0.14).
The appl_ptr has been always present, and it should be referred at each moment it's updated.
Actually it's not maintained in kernel space. Userspace applications do it, at least for operations under mmap(2) of PCM buffer. Clearly, current application-side position on the buffer is not cared in any service routine to handle hardware event.
As of 2017, we have some userspace implementation as consumers of ALSA PCM interface. If we add any change into the interface, we're responsible for notification of it via version change.
That said, the problem is purely in the kernel side implementation -- or more exactly to say, it's about the kernel / user-space ABI.
Yes. We need to add changes into the way to use kernel/userspace interface for applications in this case.
The required change would break the ABI the current alsa-lib expects, thus we need to update and enable the new ABI, conditionally for the newer alsa-lib for an optimized path. For the older alsa-lib, we keep the old ABI, a bit less optimized, but still works well enough. That's all.
As I described, I have few interests in your raw idea. It's too early in this discussion.
Well, I still don't understand (anyone else can?) what you mean as "a change of programming model" at all. It's becoming again like a typical situation you falling often into while discussing with others; your new term is too ambiguous and unique to parse without a clarification. PLEASE, clarify your idea in a more understandable way, at best with a (pseudo-) code snippet.
And, I think you miss a few points, thus the argument was twisted.
- The primary goal is to achieve the notification of appl_ptr change to kernel.
- The kernel needs to work with the existing user-space. It's a MUST.
- appl_ptr or whatever a status *is* maintained in kernel -- or better to say, it's kept in both kernel and user-space. (Otherwise why it becomes a problem now?)
So, if you have a better idea for achieving the goal without changing the current ABI, please tell us. It'd be really appreciated!
For the future development with the ABI extension, we may do implement in a different level, yes. Basically we can change everything. But this is not the thing we need to fix right now.
I'm open for any changes with the ABI extension. It's certainly exciting. But, don't mix up this with the solution for the current implementation.
thanks,
Takashi
[1] [alsa-devel] [PATCH 1/2] ALSA: firewire: process packets in 'struct snd_pcm_ops.ack' callback http://mailman.alsa-project.org/pipermail/alsa-devel/2017-June/121699.html [2] [alsa-devel] [PATCH 1/2] ALSA: firewire: process packets in 'struct snd_pcm_ops.ack' callback http://mailman.alsa-project.org/pipermail/alsa-devel/2017-June/121794.html
Regards
Takashi Sakamoto
Hi,
On Jun 15 2017 17:48, Takashi Iwai wrote:
Well, I still don't understand (anyone else can?) what you mean as "a change of programming model" at all.
Here, I refer to my former message.
On Jun 13 2017 07:49, Takashi Sakamoto wrote:
Let me evaluate this patch in a point of overhead at kernel/userspace interaction.
When considering about shape of ALSA PCM applications, I can show it by below simple pseudo code:
while: poll(2) calculate how many PCM frames are available. memcpy(2)
To satisfy requirements of some drivers, we need to find a way to take userspace applications to commit the number of handled PCM frames to kernel space after the memcpy(2). For this purpose, in ALSA PCM interface, SNDRV_PCM_IOCTL_SYNC_PTR is available.
while: poll(2) calculate how many PCM frames are available. memcpy(2) ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR
Some devices/drivers request applications to perform this, due to their design of hardware for data transmission.
It's becoming again like a typical situation you falling often into while discussing with others; your new term is too ambiguous and unique to parse without a clarification. PLEASE, clarify your idea in a more understandable way, at best with a (pseudo-) code snippet.
The above.
And, I think you miss a few points, thus the argument was twisted.
- The primary goal is to achieve the notification of appl_ptr change to kernel.
It's for the purpose to support devices which have the new design for data transmission. This is the reason that I agree with upgrading version of PCM interface. I suggest adding new info flags for the specific purpose.
Disabling mmap for status/control data of PCM substream is just to support architectures to which ALSA PCM core judge non cache coherency. It's not good to utilize it as a solution of this issue because of abuse of interfaces.
- The kernel needs to work with the existing user-space. It's a MUST.
I described in a previous message.
On Jun 14 2017 23:34, Takashi Sakamoto wrote:
When working with old userspace stuffs, drivers run with old mode (namely, run for current interface). The difference of interface version is absorbed in alsa-lib as much as possible, then application runs without large change
- appl_ptr or whatever a status *is* maintained in kernel -- or better to say, it's kept in both kernel and user-space. (Otherwise why it becomes a problem now?)
This is not achieved in current implementation of ALSA PCM core yet. Even if hardware generates any event to get current application-side position on PCM buffer, it can't be achieved. There's no way for hardwares. Therefore, it's unavoided to request userspace to have care of the above pseudo code, when supporting devices with the new design.
So, if you have a better idea for achieving the goal without changing the current ABI, please tell us. It'd be really appreciated!
For the future development with the ABI extension, we may do implement in a different level, yes. Basically we can change everything. But this is not the thing we need to fix right now.
I'm open for any changes with the ABI extension. It's certainly exciting. But, don't mix up this with the solution for the current implementation.
Your idea to extend current ABI includes some jumps from current position of discussion. At least, the reason of it is somehow based on your idea to disable mmap for control/status data of PCM substream for some devices/drivers. As already stated, I'm against it with several reasons. On cache coherent architecture, there's no matter to consider about it. Thus they're not matters to me at present.
Regards
Takashi Sakamoto
On Thu, 15 Jun 2017 19:56:18 +0200, Takashi Sakamoto wrote:
Hi,
On Jun 15 2017 17:48, Takashi Iwai wrote:
Well, I still don't understand (anyone else can?) what you mean as "a change of programming model" at all.
Here, I refer to my former message.
On Jun 13 2017 07:49, Takashi Sakamoto wrote:
Let me evaluate this patch in a point of overhead at kernel/userspace interaction.
When considering about shape of ALSA PCM applications, I can show it by below simple pseudo code:
while: poll(2) calculate how many PCM frames are available. memcpy(2)
To satisfy requirements of some drivers, we need to find a way to take userspace applications to commit the number of handled PCM frames to kernel space after the memcpy(2). For this purpose, in ALSA PCM interface, SNDRV_PCM_IOCTL_SYNC_PTR is available.
while: poll(2) calculate how many PCM frames are available. memcpy(2) ioctl(2) with SNDRV_PCM_IOCTL_SYNC_PTR
Some devices/drivers request applications to perform this, due to their design of hardware for data transmission.
Yes, and it's the default behavior of all non-x86 platforms, too.
It's becoming again like a typical situation you falling often into while discussing with others; your new term is too ambiguous and unique to parse without a clarification. PLEASE, clarify your idea in a more understandable way, at best with a (pseudo-) code snippet.
The above.
So... what's the "new"? That is what I don't understand... It's the already existing model deployed without mmap. Nothing new.
In other words, such a driver already works fine on non-x86 platforms. It's "broken" only on x86 due to the forced mmap usage.
And, I think you miss a few points, thus the argument was twisted.
- The primary goal is to achieve the notification of appl_ptr change to kernel.
It's for the purpose to support devices which have the new design for data transmission. This is the reason that I agree with upgrading version of PCM interface. I suggest adding new info flags for the specific purpose.
Disabling mmap for status/control data of PCM substream is just to support architectures to which ALSA PCM core judge non cache coherency. It's not good to utilize it as a solution of this issue because of abuse of interfaces.
No, it's no abuse. The sync_ptr is the correct and designed API to notify appl_ptr update from the user space to kernel. For most of device drivers on x86, this isn't requested strictly, thus it's not done when the PCM control is mmapped. We've been just lucky, so far.
That is, letting user-space to use the sync_ptr is the right strategy from the API POV, and it can be achieved by the disablement of mmap. Unfortunately, we can't currently disable only PCM control mmap, but we have to disable both control and status PCM mmaps. It's a trade-off, but still good enough for the driver requesting it (the Intel one, which can achieve the deep sleep by that).
- The kernel needs to work with the existing user-space. It's a MUST.
I described in a previous message.
Let me repeat: the support of old user-space is a must. This is the fact. Not faked.
On Jun 14 2017 23:34, Takashi Sakamoto wrote:
When working with old userspace stuffs, drivers run with old mode (namely, run for current interface). The difference of interface version is absorbed in alsa-lib as much as possible, then application runs without large change
... and how the kernel knows that a newer alsa-lib is running, BTW?
There is no such information right now, and it's why I suggested an extension of ABI.
- appl_ptr or whatever a status *is* maintained in kernel -- or better to say, it's kept in both kernel and user-space. (Otherwise why it becomes a problem now?)
This is not achieved in current implementation of ALSA PCM core yet.
Sigh, again the misunderstanding due to the ambiguous term usage.
The appl_ptr value is tracked in user-space and kernel sides -- so they are "maintained" in both sides. "Maintain a value" doesn't define who triggers the value change.
Even if hardware generates any event to get current application-side position on PCM buffer, it can't be achieved. There's no way for hardwares. Therefore, it's unavoided to request userspace to have care of the above pseudo code, when supporting devices with the new design.
Yes, the missing piece is the synchronization between the appl_ptr values maintained in both user-space and kernel sides. And it's only on x86, and the simplest way to fix it is to disable the PCM control mmap.
So, if you have a better idea for achieving the goal without changing the current ABI, please tell us. It'd be really appreciated!
For the future development with the ABI extension, we may do implement in a different level, yes. Basically we can change everything. But this is not the thing we need to fix right now.
I'm open for any changes with the ABI extension. It's certainly exciting. But, don't mix up this with the solution for the current implementation.
Your idea to extend current ABI includes some jumps from current position of discussion. At least, the reason of it is somehow based on your idea to disable mmap for control/status data of PCM substream for some devices/drivers. As already stated, I'm against it with several reasons. On cache coherent architecture, there's no matter to consider about it. Thus they're not matters to me at present.
It's fine that you say against it. But then please state your idea how to implement the appl_ptr notification for the existing application ABI? Without it, the argument is simply useless.
Also, think again which disadvantages would disabling the control mmap (not the status mmap) have, when sync_ptr is mandatory. Or better in another form: suppose you need to use sync_ptr for appl_ptr update, what merit would you have by keeping the PCM control mmap...?
Take a look at the content of snd_pcm_mmap_control. It's only appl_ptr and avail_min where the latter can be ignored mostly. That is, disabling the PCM control mmap is effectively equivalent with requesting the appl_ptr update via sync_ptr.
So, if we can disable only the PCM control mmap, the efficiency won't be lost for kernel -> user-space PCM status update by keeping the PCM status mmap. It's the plan I mentioned as the ABI extension.
To recap: - The disablement of control/status mmap is allowed by request from a driver to enforce the sync_ptr usage for appl_ptr update.
- The extension of ABI for further improvements; then we can keep mmap of PCM status while disabling the PCM control mmap.
thanks,
Takashi
On Jun 16 2017 04:06, Takashi Iwai wrote:
Some devices/drivers request applications to perform this, due to their design of hardware for data transmission.
Yes, and it's the default behavior of all non-x86 platforms, too.
Here, you have mixture of two items; architecture difference for cache coherent functionality, and device feature for data transmission. These two items are apparently different things.
In other words, devices can be supported independently of platform architectures. Why should I apply the solution prepared for cache coherency issue to add better support for the device with new feature for its data transmission? On x86 platform (although including some exceptions such as old ATOM processors), status/control data of PCM substream can successfully be mapped to process' VMA of applications. Why should it be disable it even if works well?
Here, I'll show a table with chang history of relevant codes in kernel land, and information resources.
year | release | pcm | relevant changes | version | iface | ================================================================================== 2000 | v0.5.0 | 1.0.0 | status/control/fragments data can be mapped into VMA[1] 2003 | v0.9.0 | 2.0.5 | status/control data can be mapped independently[2] 2004 | v1.0.5 | 2.0.7 | SYNC_PTR command was introduced for PCM ioctl operation[3] 2004 | v1.0.7 | 2.0.7 | mmap for status/control data is allowed for x86/ppc/alpha[4]
In note for v1.0.5 release, Jaroslav described[5]:
``` - PCM midlevel - added SYNC_PTR ioctl (for problematic cache coherency archs) ```
The SYNC_PTR and architecture-dependent enabling of the mmap was originally introduced to solve cache coherent issue.
It's becoming again like a typical situation you falling often into while discussing with others; your new term is too ambiguous and unique to parse without a clarification. PLEASE, clarify your idea in a more understandable way, at best with a (pseudo-) code snippet.
The above.
So... what's the "new"? That is what I don't understand... It's the already existing model deployed without mmap. Nothing new.
This subsystem has no drivers with the similar feature, thus it's new design for data transmission. The design supports below things:
1. Hardware features: 1.1. Data transmission is done by direct media access (DMA). 1.2. Hardware cares of two points for its data transmission; one is hwptr and another is appl_ptr on PCM buffer dedicated for the data transmission. 1.3. (perhaps)The granurarity of data transmission can be differed, not fixed to the size of 'period'.
2. Drivers are designed with below items: 2.1. Return SNDRV_PCM_ACCESS_MMAP_XXX flag 2.2. Implement for 'struct snd_pcm_ops.ack' to tell appl_ptr to hardware.
3. Applications should perform according to below items: 3.1. Operate with SNDRV_PCM_IOCTL_[READ|WRITE][N|I] to drive kernel land stuffs with changed appl_ptr. 3.2. Or operate with SYNC_PTR to driver kernel land stuffs when handling any PCM frames.
As of v4.12-rc5, there's no stuffs to satisfy all of these items. It's clear that we're working to support devices with the new design.
In other words, such a driver already works fine on non-x86 platforms. It's "broken" only on x86 due to the forced mmap usage.
It's better to distinguish two issues; architecture's support for cache coherency and support for new type of device.
And, I think you miss a few points, thus the argument was twisted.
- The primary goal is to achieve the notification of appl_ptr change to kernel.
It's for the purpose to support devices which have the new design for data transmission. This is the reason that I agree with upgrading version of PCM interface. I suggest adding new info flags for the specific purpose.
Disabling mmap for status/control data of PCM substream is just to support architectures to which ALSA PCM core judge non cache coherency. It's not good to utilize it as a solution of this issue because of abuse of interfaces.
No, it's no abuse. The sync_ptr is the correct and designed API to notify appl_ptr update from the user space to kernel. For most of device drivers on x86, this isn't requested strictly, thus it's not done when the PCM control is mmapped. We've been just lucky, so far.
I described that it was originally designed to solve architecture's support for cache coherency. Don't depend on extra bonus from it.
That is, letting user-space to use the sync_ptr is the right strategy from the API POV, and it can be achieved by the disablement of mmap. Unfortunately, we can't currently disable only PCM control mmap, but we have to disable both control and status PCM mmaps.
Usage of SYNC_PTR ioctl command and disabling mmap are not tight-coupled. Both of them can be used in the same time if user land stuffs is writte so.
It's a trade-off, but still good enough for the driver requesting it (the Intel one, which can achieve the deep sleep by that).
What's the 'deep sleep'? Please explain about it when you introduce new words into this discussion.
- The kernel needs to work with the existing user-space. It's a MUST.
I described in a previous message.
Let me repeat: the support of old user-space is a must. This is the fact. Not faked.
On Jun 14 2017 23:34, Takashi Sakamoto wrote:
When working with old userspace stuffs, drivers run with old mode (namely, run for current interface). The difference of interface version is absorbed in alsa-lib as much as possible, then application runs without large change
... and how the kernel knows that a newer alsa-lib is running, BTW?
There is no such information right now, and it's why I suggested an extension of ABI.
Enough information is not provided yet. Furthermore, I have few interests in it at present.
- appl_ptr or whatever a status *is* maintained in kernel -- or better to say, it's kept in both kernel and user-space. (Otherwise why it becomes a problem now?)
This is not achieved in current implementation of ALSA PCM core yet.
Sigh, again the misunderstanding due to the ambiguous term usage.
The appl_ptr value is tracked in user-space and kernel sides -- so they are "maintained" in both sides. "Maintain a value" doesn't define who triggers the value change.
The important thing for this issue is 'how to deliver the position of appl_ptr from applications to hardware'. In current implementation, when applications operate for mapped PCM buffer, kernel land stuffs don't have correct appl_ptr. Hardware is given ways to get appl_ptr and drivers don't assist it because appl_ptr is in process's VMA of applications.
Even if hardware generates any event to get current application-side position on PCM buffer, it can't be achieved. There's no way for hardwares. Therefore, it's unavoided to request userspace to have care of the above pseudo code, when supporting devices with the new design.
Yes, the missing piece is the synchronization between the appl_ptr values maintained in both user-space and kernel sides. And it's only on x86, and the simplest way to fix it is to disable the PCM control mmap.
It's not better to depend on bonus from architecture differences. Provide enough information to userspace via interface.
So, if you have a better idea for achieving the goal without changing the current ABI, please tell us. It'd be really appreciated!
For the future development with the ABI extension, we may do implement in a different level, yes. Basically we can change everything. But this is not the thing we need to fix right now.
I'm open for any changes with the ABI extension. It's certainly exciting. But, don't mix up this with the solution for the current implementation.
Your idea to extend current ABI includes some jumps from current position of discussion. At least, the reason of it is somehow based on your idea to disable mmap for control/status data of PCM substream for some devices/drivers. As already stated, I'm against it with several reasons. On cache coherent architecture, there's no matter to consider about it. Thus they're not matters to me at present.
It's fine that you say against it. But then please state your idea how to implement the appl_ptr notification for the existing application ABI? Without it, the argument is simply useless.
I already prepared patches for my ideas several days ago. However, I have hesitation to post it because you have mixtures of two issues, which I described. As long as you adhere to the mixtures, my patches are not evaluated in a appropriate logic, thus I'm unwilling to post it. I'd not like to waste of my private time.
Also, think again which disadvantages would disabling the control mmap (not the status mmap) have, when sync_ptr is mandatory. Or better in another form: suppose you need to use sync_ptr for appl_ptr update, what merit would you have by keeping the PCM control mmap...?
Take a look at the content of snd_pcm_mmap_control. It's only appl_ptr and avail_min where the latter can be ignored mostly. That is, disabling the PCM control mmap is effectively equivalent with requesting the appl_ptr update via sync_ptr.
So, if we can disable only the PCM control mmap, the efficiency won't be lost for kernel -> user-space PCM status update by keeping the PCM status mmap. It's the plan I mentioned as the ABI extension.
To recap:
- The disablement of control/status mmap is allowed by request from a driver to enforce the sync_ptr usage for appl_ptr update.
Not fair. The feature to disable mmap is just for issues from architecture feature. It's not good to utilize it for the other kind of issues. It surely puzzle developers in user space. When working for x86 architecture, developers just consider about content on mapped control/status data. We soulnd not change it.
- The extension of ABI for further improvements; then we can keep mmap of PCM status while disabling the PCM control mmap.
I'm not interested in issues directly relevant to current issue. I'd like to just focus on the current one.
[1] ftp://ftp.alsa-project.org/pub/driver/alsa-driver-0.5.0.tar.gz [2] ftp://ftp.alsa-project.org/pub/driver/alsa-driver-0.9.0rc8.tar.bz2 [3] http://git.alsa-project.org/?p=alsa-driver.git;a=commitdiff;h=60300f540726;h... [4] http://git.alsa-project.org/?p=alsa-driver.git;a=commitdiff;h=6cd0240aa173;h... [5] https://lwn.net/Articles/87517/
Regards
Takashi Sakamoto
On Jun 17 2017 00:00, Takashi Sakamoto wrote:
This subsystem has no drivers with the similar feature, thus it's new design for data transmission. The design supports below things:
- Hardware features:
1.1. Data transmission is done by direct media access (DMA). 1.2. Hardware cares of two points for its data transmission; one is hwptr and another is appl_ptr on PCM buffer dedicated for the data transmission. 1.3. (perhaps)The granurarity of data transmission can be differed, not fixed to the size of 'period'.
I realized that 'granularity' (misspelled...) is not proper in this context. Here, I described intervals to start data transmissions by DMA. The transmission can be managed by pre-programmed descriptors on device register, or service routines for interrupts on drivers, and so on.
Regards
Takashi Sakamoto
On Fri, 16 Jun 2017 17:00:13 +0200, Takashi Sakamoto wrote:
On Jun 16 2017 04:06, Takashi Iwai wrote:
Some devices/drivers request applications to perform this, due to their design of hardware for data transmission.
Yes, and it's the default behavior of all non-x86 platforms, too.
Here, you have mixture of two items; architecture difference for cache coherent functionality, and device feature for data transmission. These two items are apparently different things.
In other words, devices can be supported independently of platform architectures. Why should I apply the solution prepared for cache coherency issue to add better support for the device with new feature for its data transmission? On x86 platform (although including some exceptions such as old ATOM processors), status/control data of PCM substream can successfully be mapped to process' VMA of applications. Why should it be disable it even if works well?
Because it doesn't work well for the new feature :)
Here, I'll show a table with chang history of relevant codes in kernel land, and information resources.
year | release | pcm | relevant changes | version | iface | ================================================================================== 2000 | v0.5.0 | 1.0.0 | status/control/fragments data can be mapped into VMA[1] 2003 | v0.9.0 | 2.0.5 | status/control data can be mapped independently[2] 2004 | v1.0.5 | 2.0.7 | SYNC_PTR command was introduced for PCM ioctl operation[3] 2004 | v1.0.7 | 2.0.7 | mmap for status/control data is allowed for x86/ppc/alpha[4]
In note for v1.0.5 release, Jaroslav described[5]:
- PCM midlevel - added SYNC_PTR ioctl (for problematic cache coherency archs)
The SYNC_PTR and architecture-dependent enabling of the mmap was originally introduced to solve cache coherent issue.
It's becoming again like a typical situation you falling often into while discussing with others; your new term is too ambiguous and unique to parse without a clarification. PLEASE, clarify your idea in a more understandable way, at best with a (pseudo-) code snippet.
The above.
So... what's the "new"? That is what I don't understand... It's the already existing model deployed without mmap. Nothing new.
This subsystem has no drivers with the similar feature, thus it's new design for data transmission. The design supports below things:
- Hardware features:
1.1. Data transmission is done by direct media access (DMA). 1.2. Hardware cares of two points for its data transmission; one is hwptr and another is appl_ptr on PCM buffer dedicated for the data transmission. 1.3. (perhaps)The granurarity of data transmission can be differed, not fixed to the size of 'period'.
- Drivers are designed with below items:
2.1. Return SNDRV_PCM_ACCESS_MMAP_XXX flag 2.2. Implement for 'struct snd_pcm_ops.ack' to tell appl_ptr to hardware.
- Applications should perform according to below items:
3.1. Operate with SNDRV_PCM_IOCTL_[READ|WRITE][N|I] to drive kernel land stuffs with changed appl_ptr. 3.2. Or operate with SYNC_PTR to driver kernel land stuffs when handling any PCM frames.
As of v4.12-rc5, there's no stuffs to satisfy all of these items.
It works on non-x86 systems as is with 4.12.
It's clear that we're working to support devices with the new design.
Yeah, I sort of understand you're sticking with "new design". So, no need to argue about it. It doesn't matter whether it's really so shiny new or not. It's something we need to fix, after all.
In other words, such a driver already works fine on non-x86 platforms. It's "broken" only on x86 due to the forced mmap usage.
It's better to distinguish two issues; architecture's support for cache coherency and support for new type of device.
Well, that's not really true. Most of drivers worked so far with a luck. Fortunately, the hardware that doesn't have the mappable hardware buffer can be woken up well enough via period setup, thus it could synchronize appl_ptr that user-space already changed in the past. This, however, doesn't mean that the current x86 mmap is perfectly working.
Imagine that you stop the stream at the middle of period chunk. For the hardware with the mapped h/w buffer, it can play up to the aborted position. OTOH, for the hardware without mapped buffer, it can't reach at that position because the sync of appl_ptr was missing before the period elapsed. If the appl_ptr could have been notified via sync_ptr, the driver could copy the data, and it could reach to the aborted point.
So, currently it effectively enforces the BATCH style buffer although it doesn't have to be so. It's a known bug by the status / control mmap, but we've ignored this just because such hardware are minor and the problem itself is trivial. But you see that the current code is even not perfect for the existing hardware.
And, I think you miss a few points, thus the argument was twisted.
- The primary goal is to achieve the notification of appl_ptr change to kernel.
It's for the purpose to support devices which have the new design for data transmission. This is the reason that I agree with upgrading version of PCM interface. I suggest adding new info flags for the specific purpose.
Disabling mmap for status/control data of PCM substream is just to support architectures to which ALSA PCM core judge non cache coherency. It's not good to utilize it as a solution of this issue because of abuse of interfaces.
No, it's no abuse. The sync_ptr is the correct and designed API to notify appl_ptr update from the user space to kernel. For most of device drivers on x86, this isn't requested strictly, thus it's not done when the PCM control is mmapped. We've been just lucky, so far.
I described that it was originally designed to solve architecture's support for cache coherency. Don't depend on extra bonus from it.
Oh well... Please stop such a fundamentalism argument. The original purpose doesn't mean to limit its usage. We aren't arguing history or religion.
That is, letting user-space to use the sync_ptr is the right strategy from the API POV, and it can be achieved by the disablement of mmap. Unfortunately, we can't currently disable only PCM control mmap, but we have to disable both control and status PCM mmaps.
Usage of SYNC_PTR ioctl command and disabling mmap are not tight-coupled. Both of them can be used in the same time if user land stuffs is writte so.
Sure, it's a semantic difference. I don't pursue this way if we are allowed to extend ABI, either. This will be the "phase 2".
Remember that my suggestion is for the "phase 1", to fix the stuff for the existing applications without changing the ABI.
It's a trade-off, but still good enough for the driver requesting it (the Intel one, which can achieve the deep sleep by that).
What's the 'deep sleep'? Please explain about it when you introduce new words into this discussion.
I thought Pierre (or other Intel people) already mentioned that. Or maybe it's in a different patchset. In anyway...
The chip (DSP) can prefetch the data on the buffer and go to a deep sleep mode. It's the reason why appl_ptr update is needed. Then we can know how much data can be prefetched. This should be the great reduction of power, thus a slight increase of instructions would be like a peanut.
- The kernel needs to work with the existing user-space. It's a MUST.
I described in a previous message.
Let me repeat: the support of old user-space is a must. This is the fact. Not faked.
On Jun 14 2017 23:34, Takashi Sakamoto wrote:
When working with old userspace stuffs, drivers run with old mode (namely, run for current interface). The difference of interface version is absorbed in alsa-lib as much as possible, then application runs without large change
... and how the kernel knows that a newer alsa-lib is running, BTW?
There is no such information right now, and it's why I suggested an extension of ABI.
Enough information is not provided yet. Furthermore, I have few interests in it at present.
Right, not yet. That's the point. If we want to achieve the fix without changing the user-space, you can't rely on this not-yet-present information.
- appl_ptr or whatever a status *is* maintained in kernel -- or better to say, it's kept in both kernel and user-space. (Otherwise why it becomes a problem now?)
This is not achieved in current implementation of ALSA PCM core yet.
Sigh, again the misunderstanding due to the ambiguous term usage.
The appl_ptr value is tracked in user-space and kernel sides -- so they are "maintained" in both sides. "Maintain a value" doesn't define who triggers the value change.
The important thing for this issue is 'how to deliver the position of appl_ptr from applications to hardware'. In current implementation, when applications operate for mapped PCM buffer, kernel land stuffs don't have correct appl_ptr. Hardware is given ways to get appl_ptr and drivers don't assist it because appl_ptr is in process's VMA of applications.
Yes. That's what I called synchronization in the below.
Even if hardware generates any event to get current application-side position on PCM buffer, it can't be achieved. There's no way for hardwares. Therefore, it's unavoided to request userspace to have care of the above pseudo code, when supporting devices with the new design.
Yes, the missing piece is the synchronization between the appl_ptr values maintained in both user-space and kernel sides. And it's only on x86, and the simplest way to fix it is to disable the PCM control mmap.
It's not better to depend on bonus from architecture differences. Provide enough information to userspace via interface.
So, if you have a better idea for achieving the goal without changing the current ABI, please tell us. It'd be really appreciated!
For the future development with the ABI extension, we may do implement in a different level, yes. Basically we can change everything. But this is not the thing we need to fix right now.
I'm open for any changes with the ABI extension. It's certainly exciting. But, don't mix up this with the solution for the current implementation.
Your idea to extend current ABI includes some jumps from current position of discussion. At least, the reason of it is somehow based on your idea to disable mmap for control/status data of PCM substream for some devices/drivers. As already stated, I'm against it with several reasons. On cache coherent architecture, there's no matter to consider about it. Thus they're not matters to me at present.
It's fine that you say against it. But then please state your idea how to implement the appl_ptr notification for the existing application ABI? Without it, the argument is simply useless.
I already prepared patches for my ideas several days ago. However, I have hesitation to post it because you have mixtures of two issues, which I described. As long as you adhere to the mixtures, my patches are not evaluated in a appropriate logic, thus I'm unwilling to post it. I'd not like to waste of my private time.
Sakamoto-san, if you have the idea to fix the behavior, PLEASE PLEASE show it at first!!! This is greatly appreciated.
I meant here a solution that works with the currently existing application as is -- i.e. change only in the kernel side. If this can be achieved without disabling mmap, it's very interesting. Please let me know.
Once we fix this for the existing applications, then we can go further, a better implementation with the extension of ABI.
BTW, mentioning about waste of your private time doesn't sound so impressive, honestly speaking. I noticed it often appearing in your patch changelogs. Most of people (including me) spend private time for developing Linux kernel stuff, too. My own main job is not maintaining ALSA, either.
thanks,
Takashi
Also, think again which disadvantages would disabling the control mmap (not the status mmap) have, when sync_ptr is mandatory. Or better in another form: suppose you need to use sync_ptr for appl_ptr update, what merit would you have by keeping the PCM control mmap...?
Take a look at the content of snd_pcm_mmap_control. It's only appl_ptr and avail_min where the latter can be ignored mostly. That is, disabling the PCM control mmap is effectively equivalent with requesting the appl_ptr update via sync_ptr.
So, if we can disable only the PCM control mmap, the efficiency won't be lost for kernel -> user-space PCM status update by keeping the PCM status mmap. It's the plan I mentioned as the ABI extension.
To recap:
- The disablement of control/status mmap is allowed by request from a driver to enforce the sync_ptr usage for appl_ptr update.
Not fair. The feature to disable mmap is just for issues from architecture feature. It's not good to utilize it for the other kind of issues. It surely puzzle developers in user space. When working for x86 architecture, developers just consider about content on mapped control/status data. We soulnd not change it.
- The extension of ABI for further improvements; then we can keep mmap of PCM status while disabling the PCM control mmap.
I'm not interested in issues directly relevant to current issue. I'd like to just focus on the current one.
[1] ftp://ftp.alsa-project.org/pub/driver/alsa-driver-0.5.0.tar.gz [2] ftp://ftp.alsa-project.org/pub/driver/alsa-driver-0.9.0rc8.tar.bz2 [3] http://git.alsa-project.org/?p=alsa-driver.git;a=commitdiff;h=60300f540726;h... [4] http://git.alsa-project.org/?p=alsa-driver.git;a=commitdiff;h=6cd0240aa173;h... [5] https://lwn.net/Articles/87517/
Regards
Takashi Sakamoto
Hi,
On Jun 17 2017 00:45, Takashi Iwai wrote:
On Fri, 16 Jun 2017 17:00:13 +0200, Takashi Sakamoto wrote:
On Jun 16 2017 04:06, Takashi Iwai wrote:
Some devices/drivers request applications to perform this, due to
their
design of hardware for data transmission.
Yes, and it's the default behavior of all non-x86 platforms, too.
Here, you have mixture of two items; architecture difference for cache coherent functionality, and device feature for data transmission. These two items are apparently different things.
In other words, devices can be supported independently of platform architectures. Why should I apply the solution prepared for cache coherency issue to add better support for the device with new feature for its data transmission? On x86 platform (although including some exceptions such as old ATOM processors), status/control data of PCM substream can successfully be mapped to process' VMA of applications. Why should it be disable it even if works well?
Because it doesn't work well for the new feature :)
In this point, I cannot understand your insistence.
The page frame for status/control data of PCM substream is mapped into process' VMA of application with _read-only_ attributes. In a point to deliver the status/control data from kernel space to user space, it works well. On x86 platforms, this works fine exactly as the aim.
On the other hand, what we should achieve for current issue is to deliver information from applications to hardware. This is not relevant to the page frame mapping.
These two are apparently different issues. You intend to apply a solution for the former as a solution for the latter, however it's against original aim of the latter. To me this is in a gray zone to agree with it.
So... what's the "new"? That is what I don't understand... It's the already existing model deployed without mmap. Nothing new.
This subsystem has no drivers with the similar feature, thus it's new design for data transmission. The design supports below things:
- Hardware features:
1.1. Data transmission is done by direct media access (DMA). 1.2. Hardware cares of two points for its data transmission; one is hwptr and another is appl_ptr on PCM buffer dedicated for the data transmission. 1.3. (perhaps)The granurarity of data transmission can be differed, not fixed to the size of 'period'.
- Drivers are designed with below items:
2.1. Return SNDRV_PCM_ACCESS_MMAP_XXX flag 2.2. Implement for 'struct snd_pcm_ops.ack' to tell appl_ptr to hardware.
- Applications should perform according to below items:
3.1. Operate with SNDRV_PCM_IOCTL_[READ|WRITE][N|I] to drive kernel land stuffs with changed appl_ptr. 3.2. Or operate with SYNC_PTR to driver kernel land stuffs when handling any PCM frames.
As of v4.12-rc5, there's no stuffs to satisfy all of these items.
It works on non-x86 systems as is with 4.12.
It's a sub-effect, like a bonus, from a solution for issues of cache coherency dependently of architecture, as I said.
In other words, such a driver already works fine on non-x86 platforms. It's "broken" only on x86 due to the forced mmap usage.
It's better to distinguish two issues; architecture's support for cache coherency and support for new type of device.
Well, that's not really true. Most of drivers worked so far with a luck. Fortunately, the hardware that doesn't have the mappable hardware buffer can be woken up well enough via period setup, thus it could synchronize appl_ptr that user-space already changed in the past. This, however, doesn't mean that the current x86 mmap is perfectly working.
Imagine that you stop the stream at the middle of period chunk. For the hardware with the mapped h/w buffer, it can play up to the aborted position. OTOH, for the hardware without mapped buffer, it can't reach at that position because the sync of appl_ptr was missing before the period elapsed. If the appl_ptr could have been notified via sync_ptr, the driver could copy the data, and it could reach to the aborted point.
So, currently it effectively enforces the BATCH style buffer although it doesn't have to be so. It's a known bug by the status / control mmap, but we've ignored this just because such hardware are minor and the problem itself is trivial. But you see that the current code is even not perfect for the existing hardware.
I'm a developer for drivers which use PCM buffer as intermediate buffer for packet buffer. I understand what you explained correctly because I've considered about it for recent several years.
But this is our of my concern about your patch.
And, I think you miss a few points, thus the argument was twisted.
- The primary goal is to achieve the notification of appl_ptr change to kernel.
It's for the purpose to support devices which have the new design for data transmission. This is the reason that I agree with upgrading version of PCM interface. I suggest adding new info flags for the specific purpose.
Disabling mmap for status/control data of PCM substream is just to support architectures to which ALSA PCM core judge non cache
coherency.
It's not good to utilize it as a solution of this issue because of abuse of interfaces.
No, it's no abuse. The sync_ptr is the correct and designed API to notify appl_ptr update from the user space to kernel. For most of device drivers on x86, this isn't requested strictly, thus it's not done when the PCM control is mmapped. We've been just lucky, so far.
I described that it was originally designed to solve architecture's support for cache coherency. Don't depend on extra bonus from it.
Oh well... Please stop such a fundamentalism argument. The original purpose doesn't mean to limit its usage. We aren't arguing history or religion.
If an issue were encapsulated only in kernel land or user land, I would have no objection to your patch. I'm willing to review it.
However, in a current case, it relates to interaction between kernel/user. In my opinion, it's better to avoid changing fundamental meaning of features which are already exposed to one side. Therefore on x86/ppc/alpha architectures, userspace applications (not only alsa-lib API applications but also applications with any I/O library) can use status/control data on own VMA. Else, not. This is independent on peripheral devices.
My intention to continue this discussion is the above. The scope of issues is different; one is architecture-dependent, another is device-dependent, thus solution should be different. Your patch has a potential to puzzle developers for user land, like 'why the page frame is not mapped for my application even if on the same architecture? why it's device dependent?'
It's a trade-off, but still good enough for the driver requesting it (the Intel one, which can achieve the deep sleep by that).
What's the 'deep sleep'? Please explain about it when you introduce new words into this discussion.
I thought Pierre (or other Intel people) already mentioned that. Or maybe it's in a different patchset. In anyway...
The chip (DSP) can prefetch the data on the buffer and go to a deep sleep mode. It's the reason why appl_ptr update is needed. Then we can know how much data can be prefetched. This should be the great reduction of power, thus a slight increase of instructions would be like a peanut.
In my mailbox, there's no message with the keyword. I guess that it was introduced with the other expression. But anyway, it's mostly what I've imagined. Thanks for your confirmation.
(I omitted some texts to focus on the main issue.)
Regards
Takashi Sakamoto
On Sun, 18 Jun 2017 12:13:54 +0200, Takashi Sakamoto wrote:
Hi,
On Jun 17 2017 00:45, Takashi Iwai wrote:
On Fri, 16 Jun 2017 17:00:13 +0200, Takashi Sakamoto wrote:
On Jun 16 2017 04:06, Takashi Iwai wrote:
Some devices/drivers request applications to perform this, due to
their
design of hardware for data transmission.
Yes, and it's the default behavior of all non-x86 platforms, too.
Here, you have mixture of two items; architecture difference for cache coherent functionality, and device feature for data transmission. These two items are apparently different things.
In other words, devices can be supported independently of platform architectures. Why should I apply the solution prepared for cache coherency issue to add better support for the device with new feature for its data transmission? On x86 platform (although including some exceptions such as old ATOM processors), status/control data of PCM substream can successfully be mapped to process' VMA of applications. Why should it be disable it even if works well?
Because it doesn't work well for the new feature :)
In this point, I cannot understand your insistence.
The page frame for status/control data of PCM substream is mapped into process' VMA of application with _read-only_ attributes.
No. The basic ides is that control mmap is write (user -> kernel) while the status mmap is read (kernel -> user). It's why there are two distinct mmaps.
In a point to deliver the status/control data from kernel space to user space, it works well. On x86 platforms, this works fine exactly as the aim.
On the other hand, what we should achieve for current issue is to deliver information from applications to hardware. This is not relevant to the page frame mapping.
There was an implicit assumption by the control mmap that the hardware buffer management doesn't need the kernel notification. It is a problem even for some already existing drivers. However, usually this is a small matter, so we've ignored it. That is, the problem has been always there from the beginning. Now it becomes clearer, no longer negligible with a large buffer without interrupts, thus it requires a resolution.
These two are apparently different issues. You intend to apply a solution for the former as a solution for the latter, however it's against original aim of the latter. To me this is in a gray zone to agree with it.
So... what's the "new"? That is what I don't understand... It's the already existing model deployed without mmap. Nothing new.
This subsystem has no drivers with the similar feature, thus it's new design for data transmission. The design supports below things:
- Hardware features:
1.1. Data transmission is done by direct media access (DMA). 1.2. Hardware cares of two points for its data transmission; one is hwptr and another is appl_ptr on PCM buffer dedicated for the data transmission. 1.3. (perhaps)The granurarity of data transmission can be differed, not fixed to the size of 'period'.
- Drivers are designed with below items:
2.1. Return SNDRV_PCM_ACCESS_MMAP_XXX flag 2.2. Implement for 'struct snd_pcm_ops.ack' to tell appl_ptr to hardware.
- Applications should perform according to below items:
3.1. Operate with SNDRV_PCM_IOCTL_[READ|WRITE][N|I] to drive kernel land stuffs with changed appl_ptr. 3.2. Or operate with SYNC_PTR to driver kernel land stuffs when handling any PCM frames.
As of v4.12-rc5, there's no stuffs to satisfy all of these items.
It works on non-x86 systems as is with 4.12.
It's a sub-effect, like a bonus, from a solution for issues of cache coherency dependently of architecture, as I said.
It's not sub-effect. Even x86 requires sync_ptr in the 32bit compatible mode. That said, the sync_ptr API is not only for non-coherent architecture, but it's a general purpose API.
In other words, such a driver already works fine on non-x86 platforms. It's "broken" only on x86 due to the forced mmap usage.
It's better to distinguish two issues; architecture's support for cache coherency and support for new type of device.
Well, that's not really true. Most of drivers worked so far with a luck. Fortunately, the hardware that doesn't have the mappable hardware buffer can be woken up well enough via period setup, thus it could synchronize appl_ptr that user-space already changed in the past. This, however, doesn't mean that the current x86 mmap is perfectly working.
Imagine that you stop the stream at the middle of period chunk. For the hardware with the mapped h/w buffer, it can play up to the aborted position. OTOH, for the hardware without mapped buffer, it can't reach at that position because the sync of appl_ptr was missing before the period elapsed. If the appl_ptr could have been notified via sync_ptr, the driver could copy the data, and it could reach to the aborted point.
So, currently it effectively enforces the BATCH style buffer although it doesn't have to be so. It's a known bug by the status / control mmap, but we've ignored this just because such hardware are minor and the problem itself is trivial. But you see that the current code is even not perfect for the existing hardware.
I'm a developer for drivers which use PCM buffer as intermediate buffer for packet buffer. I understand what you explained correctly because I've considered about it for recent several years.
But this is our of my concern about your patch.
And, I think you miss a few points, thus the argument was twisted.
- The primary goal is to achieve the notification of appl_ptr change to kernel.
It's for the purpose to support devices which have the new design for data transmission. This is the reason that I agree with upgrading version of PCM interface. I suggest adding new info flags for the specific purpose.
Disabling mmap for status/control data of PCM substream is just to support architectures to which ALSA PCM core judge non cache
coherency.
It's not good to utilize it as a solution of this issue because of abuse of interfaces.
No, it's no abuse. The sync_ptr is the correct and designed API to notify appl_ptr update from the user space to kernel. For most of device drivers on x86, this isn't requested strictly, thus it's not done when the PCM control is mmapped. We've been just lucky, so far.
I described that it was originally designed to solve architecture's support for cache coherency. Don't depend on extra bonus from it.
Oh well... Please stop such a fundamentalism argument. The original purpose doesn't mean to limit its usage. We aren't arguing history or religion.
If an issue were encapsulated only in kernel land or user land, I would have no objection to your patch. I'm willing to review it.
However, in a current case, it relates to interaction between kernel/user. In my opinion, it's better to avoid changing fundamental meaning of features which are already exposed to one side. Therefore on x86/ppc/alpha architectures, userspace applications (not only alsa-lib API applications but also applications with any I/O library) can use status/control data on own VMA. Else, not. This is independent on peripheral devices.
My intention to continue this discussion is the above. The scope of issues is different; one is architecture-dependent, another is device-dependent, thus solution should be different. Your patch has a potential to puzzle developers for user land, like 'why the page frame is not mapped for my application even if on the same architecture? why it's device dependent?'
It's an optimized path, and the user-space *must* support the codes with sync_ptr as fallback mandatorily. The code path using the mmap is rather optional, so to say. It should be documented clearly, indeed, for avoiding misunderstanding.
I hope the situation is clearer for you now. If you have the alternative fix for the issue with keeping the current ABI, please please show up. We need the fix, not endless discussions.
thanks,
Takashi
ALSA fireface driver has ALSA specific operations for MIDI/PCM data. Structured data for the operations can be constified. Additionally, The structured data can be function local.
Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp --- sound/firewire/fireface/ff-midi.c | 22 ++++++++--------- sound/firewire/fireface/ff-pcm.c | 52 +++++++++++++++++++-------------------- 2 files changed, 35 insertions(+), 39 deletions(-)
diff --git a/sound/firewire/fireface/ff-midi.c b/sound/firewire/fireface/ff-midi.c index 29ee0a7365c3..949ee56b4e0e 100644 --- a/sound/firewire/fireface/ff-midi.c +++ b/sound/firewire/fireface/ff-midi.c @@ -74,18 +74,6 @@ static void midi_playback_trigger(struct snd_rawmidi_substream *substream, spin_unlock_irqrestore(&ff->lock, flags); }
-static struct snd_rawmidi_ops midi_capture_ops = { - .open = midi_capture_open, - .close = midi_capture_close, - .trigger = midi_capture_trigger, -}; - -static struct snd_rawmidi_ops midi_playback_ops = { - .open = midi_playback_open, - .close = midi_playback_close, - .trigger = midi_playback_trigger, -}; - static void set_midi_substream_names(struct snd_rawmidi_str *stream, const char *const name) { @@ -99,6 +87,16 @@ static void set_midi_substream_names(struct snd_rawmidi_str *stream,
int snd_ff_create_midi_devices(struct snd_ff *ff) { + static const struct snd_rawmidi_ops midi_capture_ops = { + .open = midi_capture_open, + .close = midi_capture_close, + .trigger = midi_capture_trigger, + }; + static const struct snd_rawmidi_ops midi_playback_ops = { + .open = midi_playback_open, + .close = midi_playback_close, + .trigger = midi_playback_trigger, + }; struct snd_rawmidi *rmidi; struct snd_rawmidi_str *stream; int err; diff --git a/sound/firewire/fireface/ff-pcm.c b/sound/firewire/fireface/ff-pcm.c index adb5c87f492f..ad974b5a2561 100644 --- a/sound/firewire/fireface/ff-pcm.c +++ b/sound/firewire/fireface/ff-pcm.c @@ -379,35 +379,33 @@ static int pcm_playback_ack(struct snd_pcm_substream *substream) return amdtp_stream_pcm_ack(&ff->rx_stream); }
-static struct snd_pcm_ops pcm_capture_ops = { - .open = pcm_open, - .close = pcm_close, - .ioctl = snd_pcm_lib_ioctl, - .hw_params = pcm_capture_hw_params, - .hw_free = pcm_capture_hw_free, - .prepare = pcm_capture_prepare, - .trigger = pcm_capture_trigger, - .pointer = pcm_capture_pointer, - .ack = pcm_capture_ack, - .page = snd_pcm_lib_get_vmalloc_page, -}; - -static struct snd_pcm_ops pcm_playback_ops = { - .open = pcm_open, - .close = pcm_close, - .ioctl = snd_pcm_lib_ioctl, - .hw_params = pcm_playback_hw_params, - .hw_free = pcm_playback_hw_free, - .prepare = pcm_playback_prepare, - .trigger = pcm_playback_trigger, - .pointer = pcm_playback_pointer, - .ack = pcm_playback_ack, - .page = snd_pcm_lib_get_vmalloc_page, - .mmap = snd_pcm_lib_mmap_vmalloc, -}; - int snd_ff_create_pcm_devices(struct snd_ff *ff) { + static const struct snd_pcm_ops pcm_capture_ops = { + .open = pcm_open, + .close = pcm_close, + .ioctl = snd_pcm_lib_ioctl, + .hw_params = pcm_capture_hw_params, + .hw_free = pcm_capture_hw_free, + .prepare = pcm_capture_prepare, + .trigger = pcm_capture_trigger, + .pointer = pcm_capture_pointer, + .ack = pcm_capture_ack, + .page = snd_pcm_lib_get_vmalloc_page, + }; + static const struct snd_pcm_ops pcm_playback_ops = { + .open = pcm_open, + .close = pcm_close, + .ioctl = snd_pcm_lib_ioctl, + .hw_params = pcm_playback_hw_params, + .hw_free = pcm_playback_hw_free, + .prepare = pcm_playback_prepare, + .trigger = pcm_playback_trigger, + .pointer = pcm_playback_pointer, + .ack = pcm_playback_ack, + .page = snd_pcm_lib_get_vmalloc_page, + .mmap = snd_pcm_lib_mmap_vmalloc, + }; struct snd_pcm *pcm; int err;
On Wed, 07 Jun 2017 02:38:06 +0200, Takashi Sakamoto wrote:
ALSA fireface driver has ALSA specific operations for MIDI/PCM data. Structured data for the operations can be constified. Additionally, The structured data can be function local.
Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp
Thanks, applied.
Takashi
participants (3)
-
Takashi Iwai
-
Takashi Sakamoto
-
Vinod Koul