[PATCH v2] sound: rawmidi: Add framing mode
This commit adds a new framing mode that frames all MIDI data into 16-byte frames with a timestamp from the monotonic_raw clock.
The main benefit is that we can get accurate timestamps even if userspace wakeup and processing is not immediate.
Signed-off-by: David Henningsson coding@diwic.se ---
v2: Fixed checkpatch errors.
include/sound/rawmidi.h | 1 + include/uapi/sound/asound.h | 18 ++++++++++++++- sound/core/rawmidi.c | 45 ++++++++++++++++++++++++++++++++++++- 3 files changed, 62 insertions(+), 2 deletions(-)
diff --git a/include/sound/rawmidi.h b/include/sound/rawmidi.h index 334842daa904..4ba5d2deec18 100644 --- a/include/sound/rawmidi.h +++ b/include/sound/rawmidi.h @@ -81,6 +81,7 @@ struct snd_rawmidi_substream { bool opened; /* open flag */ bool append; /* append flag (merge more streams) */ bool active_sensing; /* send active sensing when close */ + u8 framing; /* whether to frame data (for input) */ int use_count; /* use counter (for output) */ size_t bytes; struct snd_rawmidi *rmidi; diff --git a/include/uapi/sound/asound.h b/include/uapi/sound/asound.h index 535a7229e1d9..f33076755025 100644 --- a/include/uapi/sound/asound.h +++ b/include/uapi/sound/asound.h @@ -736,12 +736,28 @@ struct snd_rawmidi_info { unsigned char reserved[64]; /* reserved for future use */ };
+enum { + SNDRV_RAWMIDI_FRAMING_NONE = 0, + SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW, + SNDRV_RAWMIDI_FRAMING_LAST = SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW, +}; + +#define SND_RAWMIDI_FRAMING_DATA_LENGTH 7 + +struct snd_rawmidi_framing_tstamp { + unsigned int tv_sec; /* seconds */ + unsigned int tv_nsec; /* nanoseconds */ + unsigned char length; + unsigned char data[SND_RAWMIDI_FRAMING_DATA_LENGTH]; +}; + struct snd_rawmidi_params { int stream; size_t buffer_size; /* queue size in bytes */ size_t avail_min; /* minimum avail bytes for wakeup */ unsigned int no_active_sensing: 1; /* do not send active sensing byte in close() */ - unsigned char reserved[16]; /* reserved for future use */ + unsigned char framing; /* For input data only, frame incoming data */ + unsigned char reserved[15]; /* reserved for future use */ };
#ifndef __KERNEL__ diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c index aca00af93afe..cd927ba178a6 100644 --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -721,6 +721,7 @@ int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params *params) { snd_rawmidi_drain_input(substream); + substream->framing = params->framing; return resize_runtime_buffer(substream->runtime, params, true); } EXPORT_SYMBOL(snd_rawmidi_input_params); @@ -963,6 +964,44 @@ static int snd_rawmidi_control_ioctl(struct snd_card *card, return -ENOIOCTLCMD; }
+static int receive_with_tstamp_framing(struct snd_rawmidi_substream *substream, + const unsigned char *buffer, int src_count, struct timespec64 *tstamp) +{ + struct snd_rawmidi_runtime *runtime = substream->runtime; + struct snd_rawmidi_framing_tstamp frame; + struct snd_rawmidi_framing_tstamp *dest_ptr; + int dest_frames = 0; + int frame_size = sizeof(struct snd_rawmidi_framing_tstamp); + + frame.tv_sec = tstamp->tv_sec; + frame.tv_nsec = tstamp->tv_nsec; + if (snd_BUG_ON(runtime->hw_ptr & 15 || runtime->buffer_size & 15 || frame_size != 16)) + return -EINVAL; + + while (src_count > 0) { + if ((int)(runtime->buffer_size - runtime->avail) < frame_size) { + runtime->xruns += src_count; + return dest_frames * frame_size; + } + if (src_count >= SND_RAWMIDI_FRAMING_DATA_LENGTH) + frame.length = SND_RAWMIDI_FRAMING_DATA_LENGTH; + else { + frame.length = src_count; + memset(frame.data, 0, SND_RAWMIDI_FRAMING_DATA_LENGTH); + } + memcpy(frame.data, buffer, frame.length); + buffer += frame.length; + src_count -= frame.length; + dest_ptr = (struct snd_rawmidi_framing_tstamp *) (runtime->buffer + runtime->hw_ptr); + *dest_ptr = frame; + runtime->avail += frame_size; + runtime->hw_ptr += frame_size; + runtime->hw_ptr %= runtime->buffer_size; + dest_frames++; + } + return dest_frames * frame_size; +} + /** * snd_rawmidi_receive - receive the input data from the device * @substream: the rawmidi substream @@ -977,6 +1016,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, const unsigned char *buffer, int count) { unsigned long flags; + struct timespec64 ts64; int result = 0, count1; struct snd_rawmidi_runtime *runtime = substream->runtime;
@@ -988,7 +1028,10 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, return -EINVAL; } spin_lock_irqsave(&runtime->lock, flags); - if (count == 1) { /* special case, faster code */ + if (substream->framing == SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW) { + ktime_get_raw_ts64(&ts64); + result = receive_with_tstamp_framing(substream, buffer, count, &ts64); + } else if (count == 1) { /* special case, faster code */ substream->bytes++; if (runtime->avail < runtime->buffer_size) { runtime->buffer[runtime->hw_ptr++] = buffer[0];
On Wed, 24 Mar 2021 06:42:53 +0100, David Henningsson wrote:
This commit adds a new framing mode that frames all MIDI data into 16-byte frames with a timestamp from the monotonic_raw clock.
The main benefit is that we can get accurate timestamps even if userspace wakeup and processing is not immediate.
Signed-off-by: David Henningsson coding@diwic.se
Thanks for the patch! I seem to have overlooked your previous post, sorry.
This looks like a good middle ground solution, while we still need to address the sequencer timestamp (basically we should be able to send an event with the timestamp prepared from the rawmidi side).
The implementation itself looks good to me. But this needs to bump the SNDRV_RAWMIDI_VERSION for indicating the new API.
Takashi
v2: Fixed checkpatch errors.
include/sound/rawmidi.h | 1 + include/uapi/sound/asound.h | 18 ++++++++++++++- sound/core/rawmidi.c | 45 ++++++++++++++++++++++++++++++++++++- 3 files changed, 62 insertions(+), 2 deletions(-)
diff --git a/include/sound/rawmidi.h b/include/sound/rawmidi.h index 334842daa904..4ba5d2deec18 100644 --- a/include/sound/rawmidi.h +++ b/include/sound/rawmidi.h @@ -81,6 +81,7 @@ struct snd_rawmidi_substream { bool opened; /* open flag */ bool append; /* append flag (merge more streams) */ bool active_sensing; /* send active sensing when close */
- u8 framing; /* whether to frame data (for input) */ int use_count; /* use counter (for output) */ size_t bytes; struct snd_rawmidi *rmidi;
diff --git a/include/uapi/sound/asound.h b/include/uapi/sound/asound.h index 535a7229e1d9..f33076755025 100644 --- a/include/uapi/sound/asound.h +++ b/include/uapi/sound/asound.h @@ -736,12 +736,28 @@ struct snd_rawmidi_info { unsigned char reserved[64]; /* reserved for future use */ };
+enum {
- SNDRV_RAWMIDI_FRAMING_NONE = 0,
- SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW,
- SNDRV_RAWMIDI_FRAMING_LAST = SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW,
+};
+#define SND_RAWMIDI_FRAMING_DATA_LENGTH 7
+struct snd_rawmidi_framing_tstamp {
- unsigned int tv_sec; /* seconds */
- unsigned int tv_nsec; /* nanoseconds */
- unsigned char length;
- unsigned char data[SND_RAWMIDI_FRAMING_DATA_LENGTH];
+};
struct snd_rawmidi_params { int stream; size_t buffer_size; /* queue size in bytes */ size_t avail_min; /* minimum avail bytes for wakeup */ unsigned int no_active_sensing: 1; /* do not send active sensing byte in close() */
- unsigned char reserved[16]; /* reserved for future use */
- unsigned char framing; /* For input data only, frame incoming data */
- unsigned char reserved[15]; /* reserved for future use */
};
#ifndef __KERNEL__ diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c index aca00af93afe..cd927ba178a6 100644 --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -721,6 +721,7 @@ int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params *params) { snd_rawmidi_drain_input(substream);
- substream->framing = params->framing; return resize_runtime_buffer(substream->runtime, params, true);
} EXPORT_SYMBOL(snd_rawmidi_input_params); @@ -963,6 +964,44 @@ static int snd_rawmidi_control_ioctl(struct snd_card *card, return -ENOIOCTLCMD; }
+static int receive_with_tstamp_framing(struct snd_rawmidi_substream *substream,
const unsigned char *buffer, int src_count, struct timespec64 *tstamp)
+{
- struct snd_rawmidi_runtime *runtime = substream->runtime;
- struct snd_rawmidi_framing_tstamp frame;
- struct snd_rawmidi_framing_tstamp *dest_ptr;
- int dest_frames = 0;
- int frame_size = sizeof(struct snd_rawmidi_framing_tstamp);
- frame.tv_sec = tstamp->tv_sec;
- frame.tv_nsec = tstamp->tv_nsec;
- if (snd_BUG_ON(runtime->hw_ptr & 15 || runtime->buffer_size & 15 || frame_size != 16))
return -EINVAL;
- while (src_count > 0) {
if ((int)(runtime->buffer_size - runtime->avail) < frame_size) {
runtime->xruns += src_count;
return dest_frames * frame_size;
}
if (src_count >= SND_RAWMIDI_FRAMING_DATA_LENGTH)
frame.length = SND_RAWMIDI_FRAMING_DATA_LENGTH;
else {
frame.length = src_count;
memset(frame.data, 0, SND_RAWMIDI_FRAMING_DATA_LENGTH);
}
memcpy(frame.data, buffer, frame.length);
buffer += frame.length;
src_count -= frame.length;
dest_ptr = (struct snd_rawmidi_framing_tstamp *) (runtime->buffer + runtime->hw_ptr);
*dest_ptr = frame;
runtime->avail += frame_size;
runtime->hw_ptr += frame_size;
runtime->hw_ptr %= runtime->buffer_size;
dest_frames++;
- }
- return dest_frames * frame_size;
+}
/**
- snd_rawmidi_receive - receive the input data from the device
- @substream: the rawmidi substream
@@ -977,6 +1016,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, const unsigned char *buffer, int count) { unsigned long flags;
- struct timespec64 ts64; int result = 0, count1; struct snd_rawmidi_runtime *runtime = substream->runtime;
@@ -988,7 +1028,10 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, return -EINVAL; } spin_lock_irqsave(&runtime->lock, flags);
- if (count == 1) { /* special case, faster code */
- if (substream->framing == SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW) {
ktime_get_raw_ts64(&ts64);
result = receive_with_tstamp_framing(substream, buffer, count, &ts64);
- } else if (count == 1) { /* special case, faster code */ substream->bytes++; if (runtime->avail < runtime->buffer_size) { runtime->buffer[runtime->hw_ptr++] = buffer[0];
-- 2.25.1
On 2021-03-24 11:03, Takashi Iwai wrote:
On Wed, 24 Mar 2021 06:42:53 +0100, David Henningsson wrote:
This commit adds a new framing mode that frames all MIDI data into 16-byte frames with a timestamp from the monotonic_raw clock.
The main benefit is that we can get accurate timestamps even if userspace wakeup and processing is not immediate.
Signed-off-by: David Henningsson coding@diwic.se
Thanks for the patch!
Thanks for the review :-)
I seem to have overlooked your previous post, sorry.
This looks like a good middle ground solution, while we still need to address the sequencer timestamp (basically we should be able to send an event with the timestamp prepared from the rawmidi side).
I believe the new framing mode would be useful both for readers of rawmidi devices, and the seq kernel module.
I have also been thinking of doing something in usb-midi (because I assume that's the most common way to do midi input these days), to improve performance for packets with more than three bytes in them. Right now a sysex would be cut off in chunks of three bytes, each one with its own timestamp. If so, that would be a later patch.
The implementation itself looks good to me. But this needs to bump the SNDRV_RAWMIDI_VERSION for indicating the new API.
Sure, I'll send a v3 with a bumped SNDRV_RAWMIDI_VERSION.
I'm also considering adding "time when the stream started" in the snd_rawmidi_status timestamp to get a fixed starting point for the timestamps, unless the field was reserved for some other purpose? The status timestamp would only be added if the framing mode is enabled. If so, that change would go into the same version bump. Does that sound good to you?
// David
Takashi
v2: Fixed checkpatch errors.
include/sound/rawmidi.h | 1 + include/uapi/sound/asound.h | 18 ++++++++++++++- sound/core/rawmidi.c | 45 ++++++++++++++++++++++++++++++++++++- 3 files changed, 62 insertions(+), 2 deletions(-)
diff --git a/include/sound/rawmidi.h b/include/sound/rawmidi.h index 334842daa904..4ba5d2deec18 100644 --- a/include/sound/rawmidi.h +++ b/include/sound/rawmidi.h @@ -81,6 +81,7 @@ struct snd_rawmidi_substream { bool opened; /* open flag */ bool append; /* append flag (merge more streams) */ bool active_sensing; /* send active sensing when close */
- u8 framing; /* whether to frame data (for input) */ int use_count; /* use counter (for output) */ size_t bytes; struct snd_rawmidi *rmidi;
diff --git a/include/uapi/sound/asound.h b/include/uapi/sound/asound.h index 535a7229e1d9..f33076755025 100644 --- a/include/uapi/sound/asound.h +++ b/include/uapi/sound/asound.h @@ -736,12 +736,28 @@ struct snd_rawmidi_info { unsigned char reserved[64]; /* reserved for future use */ };
+enum {
- SNDRV_RAWMIDI_FRAMING_NONE = 0,
- SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW,
- SNDRV_RAWMIDI_FRAMING_LAST = SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW,
+};
+#define SND_RAWMIDI_FRAMING_DATA_LENGTH 7
+struct snd_rawmidi_framing_tstamp {
- unsigned int tv_sec; /* seconds */
- unsigned int tv_nsec; /* nanoseconds */
- unsigned char length;
- unsigned char data[SND_RAWMIDI_FRAMING_DATA_LENGTH];
+};
- struct snd_rawmidi_params { int stream; size_t buffer_size; /* queue size in bytes */ size_t avail_min; /* minimum avail bytes for wakeup */ unsigned int no_active_sensing: 1; /* do not send active sensing byte in close() */
- unsigned char reserved[16]; /* reserved for future use */
unsigned char framing; /* For input data only, frame incoming data */
unsigned char reserved[15]; /* reserved for future use */ };
#ifndef __KERNEL__
diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c index aca00af93afe..cd927ba178a6 100644 --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -721,6 +721,7 @@ int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params *params) { snd_rawmidi_drain_input(substream);
- substream->framing = params->framing; return resize_runtime_buffer(substream->runtime, params, true); } EXPORT_SYMBOL(snd_rawmidi_input_params);
@@ -963,6 +964,44 @@ static int snd_rawmidi_control_ioctl(struct snd_card *card, return -ENOIOCTLCMD; }
+static int receive_with_tstamp_framing(struct snd_rawmidi_substream *substream,
const unsigned char *buffer, int src_count, struct timespec64 *tstamp)
+{
- struct snd_rawmidi_runtime *runtime = substream->runtime;
- struct snd_rawmidi_framing_tstamp frame;
- struct snd_rawmidi_framing_tstamp *dest_ptr;
- int dest_frames = 0;
- int frame_size = sizeof(struct snd_rawmidi_framing_tstamp);
- frame.tv_sec = tstamp->tv_sec;
- frame.tv_nsec = tstamp->tv_nsec;
- if (snd_BUG_ON(runtime->hw_ptr & 15 || runtime->buffer_size & 15 || frame_size != 16))
return -EINVAL;
- while (src_count > 0) {
if ((int)(runtime->buffer_size - runtime->avail) < frame_size) {
runtime->xruns += src_count;
return dest_frames * frame_size;
}
if (src_count >= SND_RAWMIDI_FRAMING_DATA_LENGTH)
frame.length = SND_RAWMIDI_FRAMING_DATA_LENGTH;
else {
frame.length = src_count;
memset(frame.data, 0, SND_RAWMIDI_FRAMING_DATA_LENGTH);
}
memcpy(frame.data, buffer, frame.length);
buffer += frame.length;
src_count -= frame.length;
dest_ptr = (struct snd_rawmidi_framing_tstamp *) (runtime->buffer + runtime->hw_ptr);
*dest_ptr = frame;
runtime->avail += frame_size;
runtime->hw_ptr += frame_size;
runtime->hw_ptr %= runtime->buffer_size;
dest_frames++;
- }
- return dest_frames * frame_size;
+}
- /**
- snd_rawmidi_receive - receive the input data from the device
- @substream: the rawmidi substream
@@ -977,6 +1016,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, const unsigned char *buffer, int count) { unsigned long flags;
- struct timespec64 ts64; int result = 0, count1; struct snd_rawmidi_runtime *runtime = substream->runtime;
@@ -988,7 +1028,10 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, return -EINVAL; } spin_lock_irqsave(&runtime->lock, flags);
- if (count == 1) { /* special case, faster code */
- if (substream->framing == SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW) {
ktime_get_raw_ts64(&ts64);
result = receive_with_tstamp_framing(substream, buffer, count, &ts64);
- } else if (count == 1) { /* special case, faster code */ substream->bytes++; if (runtime->avail < runtime->buffer_size) { runtime->buffer[runtime->hw_ptr++] = buffer[0];
-- 2.25.1
Hi,
On Wed, Mar 24, 2021 at 12:18:59PM +0100, David Henningsson wrote:
On 2021-03-24 11:03, Takashi Iwai wrote:
This looks like a good middle ground solution, while we still need to address the sequencer timestamp (basically we should be able to send an event with the timestamp prepared from the rawmidi side).
I believe the new framing mode would be useful both for readers of rawmidi devices, and the seq kernel module.
I have also been thinking of doing something in usb-midi (because I assume that's the most common way to do midi input these days), to improve performance for packets with more than three bytes in them. Right now a sysex would be cut off in chunks of three bytes, each one with its own timestamp. If so, that would be a later patch.
I've been investigated with some old documentations since David posted his initial idea[1]. However, I always have concern that we can really find timestamp for incoming data for MIDI message in hardware/software IRQ contexts.
As you know, in the specification, MIDI message has no timestamp. Even if MIDI Time Code (MTC) is included in the specification, it's the role for hardware MIDI sequencer to decide actual presentation time for received MIDI messages. In this meaning, your patch is reasonable to process the received MIDI messages.
However, the timing jitter of IRQ handler invocation is issued in this case, as well as PCM interface, even if the data rate of MIDI physical layer is quite low nowadays (31.25 Kbit / sec ~= 3906.25 byte / sec). As long as I experienced, in actual running Linux system, the invocation of IRQ handler has no guarantee for timing jitter, mainly due to CPU level IRQ mask (like spin_lock). Therefore the interval of each invocation is not so precise to decide event timestamp, at least for time slot comes from MIDI physical layer.
Nevertheless, I think your idea is enough interesting, with conditions to deliver information from driver (or driver developer) to applications (ALSA Sequencer or userspace applications). Even if we have some limitation and restriction to precise timestamp, it's worth to work for it. It seems to be required that improvements at interface level and documentation about how to use the frame timestamp you implemented.
P.S. As long as referring old resources relevant to the design of hardware MIDI sequencer in late 1990's, the above concern seems to be real to engineers. They are always requested to process MIDI message in real time somehow beyond buffering and timing jitter.
[1] Midi timestamps https://mailman.alsa-project.org/pipermail/alsa-devel/2021-March/182175.html
Regards
Takashi Sakamoto
Hi,
On Wed, Mar 24, 2021 at 06:42:53AM +0100, David Henningsson wrote:
diff --git a/include/uapi/sound/asound.h b/include/uapi/sound/asound.h index 535a7229e1d9..f33076755025 100644 --- a/include/uapi/sound/asound.h +++ b/include/uapi/sound/asound.h @@ -736,12 +736,28 @@ struct snd_rawmidi_info { unsigned char reserved[64]; /* reserved for future use */ };
+enum {
- SNDRV_RAWMIDI_FRAMING_NONE = 0,
- SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW,
- SNDRV_RAWMIDI_FRAMING_LAST = SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW,
+};
In C language specification, enumeration is for value of int storage. In my opinion, int type should be used for the framing member, perhaps.
(I think you can easily understand my insistent since you're Rust programmer.)
I note that in UAPI of Linux kernel, we have some macros to represent system clocks; e.g. CLOCK_REALTIME, CLOCK_MONOTONIC: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/incl...
We can use the series of macro, instead of defining the specific enumerations. However I have one concern that the 'None' value cannot be zero in the case since CLOCK_REALTIME is zero. This is a bit inconvenient since we need initializer function in both of kernel space and user space...
For the idea to record system timestamp when drivers call helper function to put MIDI message bytes into intermediate buffer in hardware/software IRQ context, I have some concerns and I'll post another message to thread, later.
Regards
Takashi Sakamoto
On 2021-03-24 13:44, Takashi Sakamoto wrote:
Hi,
On Wed, Mar 24, 2021 at 06:42:53AM +0100, David Henningsson wrote:
diff --git a/include/uapi/sound/asound.h b/include/uapi/sound/asound.h index 535a7229e1d9..f33076755025 100644 --- a/include/uapi/sound/asound.h +++ b/include/uapi/sound/asound.h @@ -736,12 +736,28 @@ struct snd_rawmidi_info { unsigned char reserved[64]; /* reserved for future use */ };
+enum {
- SNDRV_RAWMIDI_FRAMING_NONE = 0,
- SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW,
- SNDRV_RAWMIDI_FRAMING_LAST = SNDRV_RAWMIDI_FRAMING_TSTAMP_MONOTONIC_RAW,
+};
Hi and thanks for the review!
In C language specification, enumeration is for value of int storage. In my opinion, int type should be used for the framing member, perhaps.
In this case, I was following how the rest of the enum declarations were in the same header. In addition, the only place it is used, is as an unsigned char. So I'm not sure defining it as an int here would make sense.
(I think you can easily understand my insistent since you're Rust programmer.)
I am, and as a side point: for those who don't know, I've written (and maintaining) alsa-lib bindings for Rust in
https://github.com/diwic/alsa-rs
I note that in UAPI of Linux kernel, we have some macros to represent system clocks; e.g. CLOCK_REALTIME, CLOCK_MONOTONIC: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/incl...
We can use the series of macro, instead of defining the specific enumerations. However I have one concern that the 'None' value cannot be zero in the case since CLOCK_REALTIME is zero. This is a bit inconvenient since we need initializer function in both of kernel space and user space...
Interesting point. So I guess we could add a bit in the existing bitfield that says on/off, and then have an unsigned char that decides the clock type. But as you point out in your other reply, if we can get a timestamp from closer to the source somehow, that would be even better, and then that would be a clock that is something different from the existing clock defines in time.h.
[snip from your other reply]
However, the timing jitter of IRQ handler invocation is issued in this case, as well as PCM interface, even if the data rate of MIDI physical layer is quite low nowadays (31.25 Kbit / sec ~= 3906.25 byte / sec). As long as I experienced, in actual running Linux system, the invocation of IRQ handler has no guarantee for timing jitter, mainly due to CPU level IRQ mask (like spin_lock). Therefore the interval of each invocation is not so precise to decide event timestamp, at least for time slot comes from MIDI physical layer.
Nevertheless, I think your idea is enough interesting, with conditions to deliver information from driver (or driver developer) to applications (ALSA Sequencer or userspace applications). Even if we have some limitation and restriction to precise timestamp, it's worth to work for it. It seems to be required that improvements at interface level and documentation about how to use the frame timestamp you implemented.
Right, so first, I believe the most common way to transport midi these days is through USB, where the 31.25 Kbit/sec limit does not apply: instead we have a granularity of 1/8 ms but many messages can fit in each one. So that limit is - for many if not most cases - gone.
Second; you're probably right in that there is still some jitter w r t when the IRQ fires. That is regrettable, but the earlier we get that timestamp the better, so a timestamp in IRQ context would at least be better than in a workqueue that fires after the IRQ, or in userspace that possibly has scheduling delays.
Btw, I checked the "struct urb" and there was no timestamp in there, so I don't know how to get a better timestamp than in snd_rawmidi_receive. But maybe other interfaces (PCI, Firewire etc) offers something better.
// David
Hi David,
On Wed, Mar 24, 2021 at 04:57:31PM +0100, David Henningsson wrote:
However, the timing jitter of IRQ handler invocation is issued in this case, as well as PCM interface, even if the data rate of MIDI physical layer is quite low nowadays (31.25 Kbit / sec ~= 3906.25 byte / sec). As long as I experienced, in actual running Linux system, the invocation of IRQ handler has no guarantee for timing jitter, mainly due to CPU level IRQ mask (like spin_lock). Therefore the interval of each invocation is not so precise to decide event timestamp, at least for time slot comes from MIDI physical layer.
Nevertheless, I think your idea is enough interesting, with conditions to deliver information from driver (or driver developer) to applications (ALSA Sequencer or userspace applications). Even if we have some limitation and restriction to precise timestamp, it's worth to work for it. It seems to be required that improvements at interface level and documentation about how to use the frame timestamp you implemented.
Right, so first, I believe the most common way to transport midi these days is through USB, where the 31.25 Kbit/sec limit does not apply: instead we have a granularity of 1/8 ms but many messages can fit in each one. So that limit is - for many if not most cases - gone.
Second; you're probably right in that there is still some jitter w r t when the IRQ fires. That is regrettable, but the earlier we get that timestamp the better, so a timestamp in IRQ context would at least be better than in a workqueue that fires after the IRQ, or in userspace that possibly has scheduling delays.
Btw, I checked the "struct urb" and there was no timestamp in there, so I don't know how to get a better timestamp than in snd_rawmidi_receive. But maybe other interfaces (PCI, Firewire etc) offers something better.
Hm. Regardless of type of hardware; e.g. OHCI/EHCI/xHCI, 1394 OHCI, or PCI-e extension card, for software to process sampled data, it's always issue that the jitter between triggering IRQ (hardware side) and invoking IRQ handler (processor side). As an actual example, let me share my experience in 1394 OHCI.
1394 OHCI controller is governed by 24.576 Mhz clock (or double depending on vendors). In IEEE 1394, ishcornous service is 8,000 times per second. 1394 OHCI specification allows software to schedule hardware IRQ per isochronous cycle.
Once ALSA firewire stack is programmed to schedule the hardwar IRQ evenry 16 isochronous cycle. The corresponding IRQ handler is expected to invoke every 2 milli second. As long as I tested in usual desktop environment[2], the jitter is between 150 micro second and 4.7 milli second. In the worst case, it's 6.0 milli seconds. The above is also the same wnen using 'threadirqs'.
When using 'isolcpus' and arranging 'smp_affinity' to reduce load from one of processor core to invoke the IRQ handler, the interval is 2 milli second with +- several nano seconds, therefore the 1394 OHCI controller can trigger hardware IRQ so precise. The jitter comes from processor side. I think many running contexts on the same processor core masks IRQ so often and the jitter is not deterministic.
Here, what I'd like to tell you is that we can not handle the system time as is as timestamp of received MIDI messages, as long as relying on IRQ context. We need some arrangements to construct better timestamp with some compensations. In this point, the 3rd version of patch is not enough[3], in my opinion.
My intension is not to discourage you, but it's better to have more care. As one of the care, I think we can use extension of 'struct snd_rawmidi_status' to deliver some useful information to ALSA rawmidi applications, or including other parameters to the frame structure. But I don't have ideas about what information should be delivered, sorry...
[1] https://github.com/systemd/systemd/pull/19124 [2] I used stock image of Ubuntu 19.10 desktop for the trial. [3] https://mailman.alsa-project.org/pipermail/alsa-devel/2021-March/182842.html
Thanks
Takashi Sakamoto
On Fri, 26 Mar 2021 05:46:15 +0100, Takashi Sakamoto wrote:
Hi David,
On Wed, Mar 24, 2021 at 04:57:31PM +0100, David Henningsson wrote:
However, the timing jitter of IRQ handler invocation is issued in this case, as well as PCM interface, even if the data rate of MIDI physical layer is quite low nowadays (31.25 Kbit / sec ~= 3906.25 byte / sec). As long as I experienced, in actual running Linux system, the invocation of IRQ handler has no guarantee for timing jitter, mainly due to CPU level IRQ mask (like spin_lock). Therefore the interval of each invocation is not so precise to decide event timestamp, at least for time slot comes from MIDI physical layer.
Nevertheless, I think your idea is enough interesting, with conditions to deliver information from driver (or driver developer) to applications (ALSA Sequencer or userspace applications). Even if we have some limitation and restriction to precise timestamp, it's worth to work for it. It seems to be required that improvements at interface level and documentation about how to use the frame timestamp you implemented.
Right, so first, I believe the most common way to transport midi these days is through USB, where the 31.25 Kbit/sec limit does not apply: instead we have a granularity of 1/8 ms but many messages can fit in each one. So that limit is - for many if not most cases - gone.
Second; you're probably right in that there is still some jitter w r t when the IRQ fires. That is regrettable, but the earlier we get that timestamp the better, so a timestamp in IRQ context would at least be better than in a workqueue that fires after the IRQ, or in userspace that possibly has scheduling delays.
Btw, I checked the "struct urb" and there was no timestamp in there, so I don't know how to get a better timestamp than in snd_rawmidi_receive. But maybe other interfaces (PCI, Firewire etc) offers something better.
Hm. Regardless of type of hardware; e.g. OHCI/EHCI/xHCI, 1394 OHCI, or PCI-e extension card, for software to process sampled data, it's always issue that the jitter between triggering IRQ (hardware side) and invoking IRQ handler (processor side). As an actual example, let me share my experience in 1394 OHCI.
1394 OHCI controller is governed by 24.576 Mhz clock (or double depending on vendors). In IEEE 1394, ishcornous service is 8,000 times per second. 1394 OHCI specification allows software to schedule hardware IRQ per isochronous cycle.
Once ALSA firewire stack is programmed to schedule the hardwar IRQ evenry 16 isochronous cycle. The corresponding IRQ handler is expected to invoke every 2 milli second. As long as I tested in usual desktop environment[2], the jitter is between 150 micro second and 4.7 milli second. In the worst case, it's 6.0 milli seconds. The above is also the same wnen using 'threadirqs'.
When using 'isolcpus' and arranging 'smp_affinity' to reduce load from one of processor core to invoke the IRQ handler, the interval is 2 milli second with +- several nano seconds, therefore the 1394 OHCI controller can trigger hardware IRQ so precise. The jitter comes from processor side. I think many running contexts on the same processor core masks IRQ so often and the jitter is not deterministic.
Here, what I'd like to tell you is that we can not handle the system time as is as timestamp of received MIDI messages, as long as relying on IRQ context. We need some arrangements to construct better timestamp with some compensations. In this point, the 3rd version of patch is not enough[3], in my opinion.
My intension is not to discourage you, but it's better to have more care. As one of the care, I think we can use extension of 'struct snd_rawmidi_status' to deliver some useful information to ALSA rawmidi applications, or including other parameters to the frame structure. But I don't have ideas about what information should be delivered, sorry...
Well, the question is how much accuracy is wanted, and it's relatively low for MIDI -- at least v1, which was defined many decades ago for a slow serial line.
That said, the patch set was designed for providing the best-effort timestamping in software, and that's supposed to be enough for normal use cases. If there is any device that is with the hardware timestamping, in theory, we could provide the similar data stream in the same format but maybe with a different timestamp type.
But actually I'd like to see some measurement how much we can improve the timestamp accuracy by shifting the post office. This may show interesting numbers.
Also, one thing to be more considered is the support for MIDI v2 in future. I haven't seen any development so far (and no device available around), so I cannot comment on this much more, but it'd be worth to take a quick look before defining the solid API/ABI.
thanks,
Takashi
[1] https://github.com/systemd/systemd/pull/19124 [2] I used stock image of Ubuntu 19.10 desktop for the trial. [3] https://mailman.alsa-project.org/pipermail/alsa-devel/2021-March/182842.html
Thanks
Takashi Sakamoto
Hi Takashi and Takashi,
On 2021-03-26 08:55, Takashi Iwai wrote:
On Fri, 26 Mar 2021 05:46:15 +0100, Takashi Sakamoto wrote:
Hi David,
On Wed, Mar 24, 2021 at 04:57:31PM +0100, David Henningsson wrote:
However, the timing jitter of IRQ handler invocation is issued in this case, as well as PCM interface, even if the data rate of MIDI physical layer is quite low nowadays (31.25 Kbit / sec ~= 3906.25 byte / sec). As long as I experienced, in actual running Linux system, the invocation of IRQ handler has no guarantee for timing jitter, mainly due to CPU level IRQ mask (like spin_lock). Therefore the interval of each invocation is not so precise to decide event timestamp, at least for time slot comes from MIDI physical layer.
Nevertheless, I think your idea is enough interesting, with conditions to deliver information from driver (or driver developer) to applications (ALSA Sequencer or userspace applications). Even if we have some limitation and restriction to precise timestamp, it's worth to work for it. It seems to be required that improvements at interface level and documentation about how to use the frame timestamp you implemented.
Right, so first, I believe the most common way to transport midi these days is through USB, where the 31.25 Kbit/sec limit does not apply: instead we have a granularity of 1/8 ms but many messages can fit in each one. So that limit is - for many if not most cases - gone.
Second; you're probably right in that there is still some jitter w r t when the IRQ fires. That is regrettable, but the earlier we get that timestamp the better, so a timestamp in IRQ context would at least be better than in a workqueue that fires after the IRQ, or in userspace that possibly has scheduling delays.
Btw, I checked the "struct urb" and there was no timestamp in there, so I don't know how to get a better timestamp than in snd_rawmidi_receive. But maybe other interfaces (PCI, Firewire etc) offers something better.
Hm. Regardless of type of hardware; e.g. OHCI/EHCI/xHCI, 1394 OHCI, or PCI-e extension card, for software to process sampled data, it's always issue that the jitter between triggering IRQ (hardware side) and invoking IRQ handler (processor side). As an actual example, let me share my experience in 1394 OHCI.
1394 OHCI controller is governed by 24.576 Mhz clock (or double depending on vendors). In IEEE 1394, ishcornous service is 8,000 times per second. 1394 OHCI specification allows software to schedule hardware IRQ per isochronous cycle.
Once ALSA firewire stack is programmed to schedule the hardwar IRQ evenry 16 isochronous cycle. The corresponding IRQ handler is expected to invoke every 2 milli second. As long as I tested in usual desktop environment[2], the jitter is between 150 micro second and 4.7 milli second. In the worst case, it's 6.0 milli seconds. The above is also the same wnen using 'threadirqs'.
When using 'isolcpus' and arranging 'smp_affinity' to reduce load from one of processor core to invoke the IRQ handler, the interval is 2 milli second with +- several nano seconds, therefore the 1394 OHCI controller can trigger hardware IRQ so precise. The jitter comes from processor side. I think many running contexts on the same processor core masks IRQ so often and the jitter is not deterministic.
Here, what I'd like to tell you is that we can not handle the system time as is as timestamp of received MIDI messages, as long as relying on IRQ context. We need some arrangements to construct better timestamp with some compensations. In this point, the 3rd version of patch is not enough[3], in my opinion.
Interesting measurements. While several ms of jitter is not ideal, I have a few counter arguments why I still believe we should merge this patch:
1) I don't think we should let the perfect be the enemy of the good here, just because we cannot eliminate *all* jitter does not mean we should not try to eliminate as much jitter as we can.
2) an unprivileged process (that cannot get RT_PRIO scheduling) might have a wakeup jitter of a lot more than a few ms, so for that type of process it would be a big improvement. And sometimes even RT_PRIO processes have big delays due to preempt_disable etc.
3) The jitter will depend on the hardware, and other hardware might have better (or worse) IRQ jitter.
4) If this patch gets accepted, it might show other kernel developers that IRQ jitter matters to us, and that in turn might improve the chances of getting IRQ jitter improvement patches in, in case someone else wants to help solving that problem.
My intension is not to discourage you, but it's better to have more care. As one of the care, I think we can use extension of 'struct snd_rawmidi_status' to deliver some useful information to ALSA rawmidi applications, or including other parameters to the frame structure. But I don't have ideas about what information should be delivered, sorry...
Well, the question is how much accuracy is wanted, and it's relatively low for MIDI -- at least v1, which was defined many decades ago for a slow serial line.
That said, the patch set was designed for providing the best-effort timestamping in software, and that's supposed to be enough for normal use cases. If there is any device that is with the hardware timestamping, in theory, we could provide the similar data stream in the same format but maybe with a different timestamp type.
But actually I'd like to see some measurement how much we can improve the timestamp accuracy by shifting the post office. This may show interesting numbers.
Sorry, I don't know the idiom "shifting the post office" and neither does the urban dictionary, so I have no idea what this means. :-)
Also, one thing to be more considered is the support for MIDI v2 in future. I haven't seen any development so far (and no device available around), so I cannot comment on this much more, but it'd be worth to take a quick look before defining the solid API/ABI.
I had a quick look at MIDI 2.0. It offers something called "Jitter reduction timestamps". After some searching I found that its resolution is 16 bit, and in units of 1/31250 seconds [1]. So the suggested timestamp format of secs + nsecs would suit us well for that case, I believe. When implemented, MIDI 2.0 jitter reduction timestamps would be another clock ID on top of the existing frame format (or a new frame format, if we prefer).
A midi 2.0 UMP (Universal Midi Packet) seems to be 4, 8, 12 or 16 bytes, excluding the timestamp. If we want to fit that format with the existing patch, we could increase the frame to 32 bytes so we can fit more data per packet. Do you think we should do that? Otherwise I think Patch v3 is ready for merging.
// David
On Fri, 26 Mar 2021 17:29:04 +0100, David Henningsson wrote:
But actually I'd like to see some measurement how much we can improve the timestamp accuracy by shifting the post office. This may show interesting numbers.
Sorry, I don't know the idiom "shifting the post office" and neither does the urban dictionary, so I have no idea what this means. :-)
It was just joking; you basically moved the place to stamp the incoming data from one place (at the delivery center of a sequencer event) to another earlier place (at the irq handler).
The question is: how much time difference have you measured by this move?
Also, one thing to be more considered is the support for MIDI v2 in future. I haven't seen any development so far (and no device available around), so I cannot comment on this much more, but it'd be worth to take a quick look before defining the solid API/ABI.
I had a quick look at MIDI 2.0. It offers something called "Jitter reduction timestamps". After some searching I found that its resolution is 16 bit, and in units of 1/31250 seconds [1]. So the suggested timestamp format of secs + nsecs would suit us well for that case, I believe. When implemented, MIDI 2.0 jitter reduction timestamps would be another clock ID on top of the existing frame format (or a new frame format, if we prefer).
A midi 2.0 UMP (Universal Midi Packet) seems to be 4, 8, 12 or 16 bytes, excluding the timestamp. If we want to fit that format with the existing patch, we could increase the frame to 32 bytes so we can fit more data per packet. Do you think we should do that? Otherwise I think Patch v3 is ready for merging.
Let's evaluate a bit what would be the best fit. I see no big reason to rush the merge right now.
thanks,
Takashi
// David
On Fri, Mar 26, 2021 at 05:44:16PM +0100, Takashi Iwai wrote:
On Fri, 26 Mar 2021 17:29:04 +0100, David Henningsson wrote:
But actually I'd like to see some measurement how much we can improve the timestamp accuracy by shifting the post office. This may show interesting numbers.
Sorry, I don't know the idiom "shifting the post office" and neither does the urban dictionary, so I have no idea what this means. :-)
It was just joking; you basically moved the place to stamp the incoming data from one place (at the delivery center of a sequencer event) to another earlier place (at the irq handler).
Aha. I also have another image to estimate the time when public bus have left terminal, by the time when we see the bus at bus stop. Traffic jam makes the estimation difficult even if we have standard time table.
Just my two cents,
Takashi Sakamoto
Hi Takashi and Takashi,
You both question the usability of the patch, so let's take a step back.
Suppose you're writing the next JACK, or a DAW, or something like that. When writing a DAW, you need to support the users who need ultra-low latency for live playing of an instrument. These users (unfortunately) need to reconfigure their Linux installation, have special kernels, buy expensive sound cards etc, in order to get the best possible latency. You also should give the best possible experience for people who don't have the time to do that. Just recording a simple MIDI file should not require any extra kernel options, RT_PRIO privileges or anything like that. (And then there are people in between, who try to get the best possible latency given their limited time, money and skills.)
Now you're asking yourself whether to use rawmidi or seq API. It seems silly to have to support both. The seq interface is suboptimal for the first use case, due to the latency introduced by the workqueue. But rawmidi is entirely impossible for the second use case, due to the lack of timestamping. (From a quick look at Ardour's sources, it does support both rawmidi and seq. The rawmidi code mostly timestamps the message and sends it to another thread. [1] I e, essentially what I believe the kernel should do, because that timestamp is better.)
What you don't need is exact measurements of burst interval or even timestamp accuracy. All you have use for is the best possible timestamp, because that's what's going to be written into the MIDI file. There might be other use cases for burst intervals etc, but I don't see them helping here.
On 2021-03-26 17:44, Takashi Iwai wrote:
On Fri, 26 Mar 2021 17:29:04 +0100, David Henningsson wrote:
But actually I'd like to see some measurement how much we can improve the timestamp accuracy by shifting the post office. This may show interesting numbers.
Sorry, I don't know the idiom "shifting the post office" and neither does the urban dictionary, so I have no idea what this means. :-)
It was just joking; you basically moved the place to stamp the incoming data from one place (at the delivery center of a sequencer event) to another earlier place (at the irq handler).
The question is: how much time difference have you measured by this move?
Ok, thanks for the explanation. I have not done any measurements because it would be quite time consuming to do so, across different hardware, kernel configurations, and so on. I don't have that time right now, sorry. But the claim that workqueues can be delayed up to a second (!) from just adding a few RT_PRIO tasks [2] is enough to scare me from using the seq interface for accurate timestamping.
Also, one thing to be more considered is the support for MIDI v2 in future. I haven't seen any development so far (and no device available around), so I cannot comment on this much more, but it'd be worth to take a quick look before defining the solid API/ABI.
I had a quick look at MIDI 2.0. It offers something called "Jitter reduction timestamps". After some searching I found that its resolution is 16 bit, and in units of 1/31250 seconds [1]. So the suggested timestamp format of secs + nsecs would suit us well for that case, I believe. When implemented, MIDI 2.0 jitter reduction timestamps would be another clock ID on top of the existing frame format (or a new frame format, if we prefer).
A midi 2.0 UMP (Universal Midi Packet) seems to be 4, 8, 12 or 16 bytes, excluding the timestamp. If we want to fit that format with the existing patch, we could increase the frame to 32 bytes so we can fit more data per packet. Do you think we should do that? Otherwise I think Patch v3 is ready for merging.
Let's evaluate a bit what would be the best fit. I see no big reason to rush the merge right now.
Does this mean "evaluate for a week or two because of kernel cadence, merge windows etc" or does this mean "evaluate for months or years until someone does a full MIDI 2.0 kernel implementation"?
// David
[1] https://github.com/Ardour/ardour/blob/master/libs/backends/alsa/alsa_rawmidi... [2] http://bootloader.wikidot.com/linux:android:latency
On Sun, 28 Mar 2021 08:39:46 +0200, David Henningsson wrote:
Hi Takashi and Takashi,
You both question the usability of the patch, so let's take a step back.
Suppose you're writing the next JACK, or a DAW, or something like that. When writing a DAW, you need to support the users who need ultra-low latency for live playing of an instrument. These users (unfortunately) need to reconfigure their Linux installation, have special kernels, buy expensive sound cards etc, in order to get the best possible latency. You also should give the best possible experience for people who don't have the time to do that. Just recording a simple MIDI file should not require any extra kernel options, RT_PRIO privileges or anything like that. (And then there are people in between, who try to get the best possible latency given their limited time, money and skills.)
Now you're asking yourself whether to use rawmidi or seq API. It seems silly to have to support both. The seq interface is suboptimal for the first use case, due to the latency introduced by the workqueue. But rawmidi is entirely impossible for the second use case, due to the lack of timestamping. (From a quick look at Ardour's sources, it does support both rawmidi and seq. The rawmidi code mostly timestamps the message and sends it to another thread. [1] I e, essentially what I believe the kernel should do, because that timestamp is better.)
What you don't need is exact measurements of burst interval or even timestamp accuracy. All you have use for is the best possible timestamp, because that's what's going to be written into the MIDI file. There might be other use cases for burst intervals etc, but I don't see them helping here.
On 2021-03-26 17:44, Takashi Iwai wrote:
On Fri, 26 Mar 2021 17:29:04 +0100, David Henningsson wrote:
But actually I'd like to see some measurement how much we can improve the timestamp accuracy by shifting the post office. This may show interesting numbers.
Sorry, I don't know the idiom "shifting the post office" and neither does the urban dictionary, so I have no idea what this means. :-)
It was just joking; you basically moved the place to stamp the incoming data from one place (at the delivery center of a sequencer event) to another earlier place (at the irq handler).
The question is: how much time difference have you measured by this move?
Ok, thanks for the explanation. I have not done any measurements because it would be quite time consuming to do so, across different hardware, kernel configurations, and so on. I don't have that time right now, sorry. But the claim that workqueues can be delayed up to a second (!) from just adding a few RT_PRIO tasks [2] is enough to scare me from using the seq interface for accurate timestamping.
Also, one thing to be more considered is the support for MIDI v2 in future. I haven't seen any development so far (and no device available around), so I cannot comment on this much more, but it'd be worth to take a quick look before defining the solid API/ABI.
I had a quick look at MIDI 2.0. It offers something called "Jitter reduction timestamps". After some searching I found that its resolution is 16 bit, and in units of 1/31250 seconds [1]. So the suggested timestamp format of secs + nsecs would suit us well for that case, I believe. When implemented, MIDI 2.0 jitter reduction timestamps would be another clock ID on top of the existing frame format (or a new frame format, if we prefer).
A midi 2.0 UMP (Universal Midi Packet) seems to be 4, 8, 12 or 16 bytes, excluding the timestamp. If we want to fit that format with the existing patch, we could increase the frame to 32 bytes so we can fit more data per packet. Do you think we should do that? Otherwise I think Patch v3 is ready for merging.
Let's evaluate a bit what would be the best fit. I see no big reason to rush the merge right now.
Does this mean "evaluate for a week or two because of kernel cadence, merge windows etc" or does this mean "evaluate for months or years until someone does a full MIDI 2.0 kernel implementation"?
Well, without the actual measurement, it's purely a theoretical problem, and it implies that we haven't seen any real improvement by that, too. So, the first priority is to measure and prove the need of the changes.
Then the next thing is to determine the exact format for the new API in a solid form. It's still not fully agreed which frame size fits at best, for example. Also, we may have two individual frame types, e.g. a timestamp frame and a data frame, too, depending on the frame size and the implementation. And, it might be handy if the ioctl returns the frame size to user-space, too.
And, of course, thinking on MIDI 2.0 wouldn't be bad. Though I don't think tying with MIDI 2.0 is needed right now; instead, we should assure only that the new timestamp would be accurate enough for new extensions like MIDI 2.0.
Takashi
// David
[1] https://github.com/Ardour/ardour/blob/master/libs/backends/alsa/alsa_rawmidi... [2] http://bootloader.wikidot.com/linux:android:latency
On 2021-03-28 09:40, Takashi Iwai wrote:
On Sun, 28 Mar 2021 08:39:46 +0200, David Henningsson wrote:
Hi Takashi and Takashi,
You both question the usability of the patch, so let's take a step back.
Suppose you're writing the next JACK, or a DAW, or something like that. When writing a DAW, you need to support the users who need ultra-low latency for live playing of an instrument. These users (unfortunately) need to reconfigure their Linux installation, have special kernels, buy expensive sound cards etc, in order to get the best possible latency. You also should give the best possible experience for people who don't have the time to do that. Just recording a simple MIDI file should not require any extra kernel options, RT_PRIO privileges or anything like that. (And then there are people in between, who try to get the best possible latency given their limited time, money and skills.)
Now you're asking yourself whether to use rawmidi or seq API. It seems silly to have to support both. The seq interface is suboptimal for the first use case, due to the latency introduced by the workqueue. But rawmidi is entirely impossible for the second use case, due to the lack of timestamping. (From a quick look at Ardour's sources, it does support both rawmidi and seq. The rawmidi code mostly timestamps the message and sends it to another thread. [1] I e, essentially what I believe the kernel should do, because that timestamp is better.)
What you don't need is exact measurements of burst interval or even timestamp accuracy. All you have use for is the best possible timestamp, because that's what's going to be written into the MIDI file. There might be other use cases for burst intervals etc, but I don't see them helping here.
On 2021-03-26 17:44, Takashi Iwai wrote:
On Fri, 26 Mar 2021 17:29:04 +0100, David Henningsson wrote:
But actually I'd like to see some measurement how much we can improve the timestamp accuracy by shifting the post office. This may show interesting numbers.
Sorry, I don't know the idiom "shifting the post office" and neither does the urban dictionary, so I have no idea what this means. :-)
It was just joking; you basically moved the place to stamp the incoming data from one place (at the delivery center of a sequencer event) to another earlier place (at the irq handler).
The question is: how much time difference have you measured by this move?
Ok, thanks for the explanation. I have not done any measurements because it would be quite time consuming to do so, across different hardware, kernel configurations, and so on. I don't have that time right now, sorry. But the claim that workqueues can be delayed up to a second (!) from just adding a few RT_PRIO tasks [2] is enough to scare me from using the seq interface for accurate timestamping.
Also, one thing to be more considered is the support for MIDI v2 in future. I haven't seen any development so far (and no device available around), so I cannot comment on this much more, but it'd be worth to take a quick look before defining the solid API/ABI.
I had a quick look at MIDI 2.0. It offers something called "Jitter reduction timestamps". After some searching I found that its resolution is 16 bit, and in units of 1/31250 seconds [1]. So the suggested timestamp format of secs + nsecs would suit us well for that case, I believe. When implemented, MIDI 2.0 jitter reduction timestamps would be another clock ID on top of the existing frame format (or a new frame format, if we prefer).
A midi 2.0 UMP (Universal Midi Packet) seems to be 4, 8, 12 or 16 bytes, excluding the timestamp. If we want to fit that format with the existing patch, we could increase the frame to 32 bytes so we can fit more data per packet. Do you think we should do that? Otherwise I think Patch v3 is ready for merging.
Let's evaluate a bit what would be the best fit. I see no big reason to rush the merge right now.
Does this mean "evaluate for a week or two because of kernel cadence, merge windows etc" or does this mean "evaluate for months or years until someone does a full MIDI 2.0 kernel implementation"?
Well, without the actual measurement, it's purely a theoretical problem, and it implies that we haven't seen any real improvement by that, too. So, the first priority is to measure and prove the need of the changes.
Well, I believe that rawmidi provides less jitter than seq is not a theoretical problem but a known fact (see e g [1]), so I haven't tried to "prove" it myself. And I cannot read your mind well enough to know what you would consider a sufficient proof - are you expecting to see differences on a default or RT kernel, on a Threadripper or a Beaglebone, idle system or while running RT torture tests? Etc.
That said; there are certainly people who run the seq interface succesfully as well. It depends on both hardware, drivers, system load, kernel configuration etc (and perhaps also the timing skill of the musician!) if that work in the workqueue will be delayed often enough to not go unnoticed.
Let me ask a counter question. Suppose you were to write the next JACK, DAW as I wrote about above, where you need both the best possible latency and best possible timestamps. Would you as a maintainer recommend seq or rawmidi (the latter with timestamps taken from userspace)?
Then the next thing is to determine the exact format for the new API in a solid form. It's still not fully agreed which frame size fits at best, for example. Also, we may have two individual frame types, e.g. a timestamp frame and a data frame, too, depending on the frame size and the implementation. And, it might be handy if the ioctl returns the frame size to user-space, too.
And, of course, thinking on MIDI 2.0 wouldn't be bad. Though I don't think tying with MIDI 2.0 is needed right now; instead, we should assure only that the new timestamp would be accurate enough for new extensions like MIDI 2.0.
Okay; I think we should then go for a frame size of 32 bytes with 16 byte header/timestamp and 16 byte data. One type of frame only, no frame size ioctl will be needed because any changes to the frame format would need a new framing type. I envision the application reading this struct directly without "encapsulation" with accessors in alsa-lib. This is for MIDI 2.0 compatibility: from what I can read, MIDI 2.0 messages can be up to 16 bytes. Its "Jitter Reduction Timestamps" are 2 bytes.
// David
[1] https://github.com/williamweston/jamrouter - "Timing using the ALSA Sequencer interface is not as accurate as timing  using any of the raw drivers /.../ Avoid use of the ALSA Sequencer driver when used in conjunction real MIDI hardware at buffer periods below 4-5 milliseconds"
On Tue, 30 Mar 2021 21:35:11 +0200, David Henningsson wrote:
On 2021-03-28 09:40, Takashi Iwai wrote:
On Sun, 28 Mar 2021 08:39:46 +0200, David Henningsson wrote:
Hi Takashi and Takashi,
You both question the usability of the patch, so let's take a step back.
Suppose you're writing the next JACK, or a DAW, or something like that. When writing a DAW, you need to support the users who need ultra-low latency for live playing of an instrument. These users (unfortunately) need to reconfigure their Linux installation, have special kernels, buy expensive sound cards etc, in order to get the best possible latency. You also should give the best possible experience for people who don't have the time to do that. Just recording a simple MIDI file should not require any extra kernel options, RT_PRIO privileges or anything like that. (And then there are people in between, who try to get the best possible latency given their limited time, money and skills.)
Now you're asking yourself whether to use rawmidi or seq API. It seems silly to have to support both. The seq interface is suboptimal for the first use case, due to the latency introduced by the workqueue. But rawmidi is entirely impossible for the second use case, due to the lack of timestamping. (From a quick look at Ardour's sources, it does support both rawmidi and seq. The rawmidi code mostly timestamps the message and sends it to another thread. [1] I e, essentially what I believe the kernel should do, because that timestamp is better.)
What you don't need is exact measurements of burst interval or even timestamp accuracy. All you have use for is the best possible timestamp, because that's what's going to be written into the MIDI file. There might be other use cases for burst intervals etc, but I don't see them helping here.
On 2021-03-26 17:44, Takashi Iwai wrote:
On Fri, 26 Mar 2021 17:29:04 +0100, David Henningsson wrote:
But actually I'd like to see some measurement how much we can improve the timestamp accuracy by shifting the post office. This may show interesting numbers.
Sorry, I don't know the idiom "shifting the post office" and neither does the urban dictionary, so I have no idea what this means. :-)
It was just joking; you basically moved the place to stamp the incoming data from one place (at the delivery center of a sequencer event) to another earlier place (at the irq handler).
The question is: how much time difference have you measured by this move?
Ok, thanks for the explanation. I have not done any measurements because it would be quite time consuming to do so, across different hardware, kernel configurations, and so on. I don't have that time right now, sorry. But the claim that workqueues can be delayed up to a second (!) from just adding a few RT_PRIO tasks [2] is enough to scare me from using the seq interface for accurate timestamping.
Also, one thing to be more considered is the support for MIDI v2 in future. I haven't seen any development so far (and no device available around), so I cannot comment on this much more, but it'd be worth to take a quick look before defining the solid API/ABI.
I had a quick look at MIDI 2.0. It offers something called "Jitter reduction timestamps". After some searching I found that its resolution is 16 bit, and in units of 1/31250 seconds [1]. So the suggested timestamp format of secs + nsecs would suit us well for that case, I believe. When implemented, MIDI 2.0 jitter reduction timestamps would be another clock ID on top of the existing frame format (or a new frame format, if we prefer).
A midi 2.0 UMP (Universal Midi Packet) seems to be 4, 8, 12 or 16 bytes, excluding the timestamp. If we want to fit that format with the existing patch, we could increase the frame to 32 bytes so we can fit more data per packet. Do you think we should do that? Otherwise I think Patch v3 is ready for merging.
Let's evaluate a bit what would be the best fit. I see no big reason to rush the merge right now.
Does this mean "evaluate for a week or two because of kernel cadence, merge windows etc" or does this mean "evaluate for months or years until someone does a full MIDI 2.0 kernel implementation"?
Well, without the actual measurement, it's purely a theoretical problem, and it implies that we haven't seen any real improvement by that, too. So, the first priority is to measure and prove the need of the changes.
Well, I believe that rawmidi provides less jitter than seq is not a theoretical problem but a known fact (see e g [1]), so I haven't tried to "prove" it myself. And I cannot read your mind well enough to know what you would consider a sufficient proof - are you expecting to see differences on a default or RT kernel, on a Threadripper or a Beaglebone, idle system or while running RT torture tests? Etc.
There is certainly difference, and it might be interesting to see the dependency on the hardware or on the configuration. But, again, my primary question is: have you measured how *your patch* really provides the improvement? If yes, please show the numbers in the patch description.
That said; there are certainly people who run the seq interface succesfully as well. It depends on both hardware, drivers, system load, kernel configuration etc (and perhaps also the timing skill of the musician!) if that work in the workqueue will be delayed often enough to not go unnoticed.
Let me ask a counter question. Suppose you were to write the next JACK, DAW as I wrote about above, where you need both the best possible latency and best possible timestamps. Would you as a maintainer recommend seq or rawmidi (the latter with timestamps taken from userspace)?
Sorry, you missed the whole point. The patch was written for the latency improvement, and it's a kind of performance stuff. For such a performance change, the major interest is the number the patch provides (improves). That's why I asked it. Without numbers, it's nothing but advertising myths.
I don't have any strong opinion about sequencer vs rawmidi usage. If any of them brings a better performance, that's fine and why not. We may improve the implementation of sequencer stuff if it's requested, too; but all needs to be done per measurement at first (if 4-5ms latency is found, it's really awful and worth for investigation).
Then the next thing is to determine the exact format for the new API in a solid form. It's still not fully agreed which frame size fits at best, for example. Also, we may have two individual frame types, e.g. a timestamp frame and a data frame, too, depending on the frame size and the implementation. And, it might be handy if the ioctl returns the frame size to user-space, too.
And, of course, thinking on MIDI 2.0 wouldn't be bad. Though I don't think tying with MIDI 2.0 is needed right now; instead, we should assure only that the new timestamp would be accurate enough for new extensions like MIDI 2.0.
Okay; I think we should then go for a frame size of 32 bytes with 16 byte header/timestamp and 16 byte data. One type of frame only, no frame size ioctl will be needed because any changes to the frame format would need a new framing type. I envision the application reading this struct directly without "encapsulation" with accessors in alsa-lib. This is for MIDI 2.0 compatibility: from what I can read, MIDI 2.0 messages can be up to 16 bytes. Its "Jitter Reduction Timestamps" are 2 bytes.
This sounds promising as future-proof, yeah.
thanks,
Takashi
On 2021-03-31 09:40, Takashi Iwai wrote:
On Tue, 30 Mar 2021 21:35:11 +0200, David Henningsson wrote:
Well, I believe that rawmidi provides less jitter than seq is not a theoretical problem but a known fact (see e g [1]), so I haven't tried to "prove" it myself. And I cannot read your mind well enough to know what you would consider a sufficient proof - are you expecting to see differences on a default or RT kernel, on a Threadripper or a Beaglebone, idle system or while running RT torture tests? Etc.
There is certainly difference, and it might be interesting to see the dependency on the hardware or on the configuration. But, again, my primary question is: have you measured how *your patch* really provides the improvement? If yes, please show the numbers in the patch description.
As you requested, I have now performed such testing.
Results:
Seq - idle: 5.0 ms
Seq - hackbench: 1.3 s (yes, above one second)
Raw + framing - idle: 2.8 ms
Raw + framing - hackbench: 2.8 ms
Setup / test description:
I had an external midi sequencer connected through USB. The system under test was a Celeron N3150 with internal graphics. The sequencer was set to generate note on/note off commands exactly 10 times per second.
For the seq tests I used "arecordmidi" and analyzed the delta values of resulting midi file. For the raw + framing tests I used a home-made application to write a midi file. The monotonic clock option was used to rule out differences between monotonic and monotonic_raw. The result shown above is the maximum amount of delta value, converted to milliseconds, minus the expected 100 ms between notes. Each test was run for a minute or two.
For the "idle" test, the machine was idle (running a normal desktop), and for the "hackbench" test, "chrt -r 10 hackbench" was run a few times in parallel with the midi recording application (which was run with "chrt -r 15").
I also tried a few other stress tests but hackbench was the one that stood out as totally destroying the timestamps of seq midi. (E g, running "rt-migrate-test" in parallel with "arecordmidi" gave a max jitter value of 13 ms.)
Conclusion:
I still believe the proposed raw + framing mode is a valuable improvement in the normal/idle case, but even more so because it is more stable in stressed conditions. Do you agree?
If so, I'll send out a v4 with 32 byte framing (16 byte header, 16 byte data).
// David
On Mon, 05 Apr 2021 14:13:27 +0200, David Henningsson wrote:
On 2021-03-31 09:40, Takashi Iwai wrote:
On Tue, 30 Mar 2021 21:35:11 +0200, David Henningsson wrote:
Well, I believe that rawmidi provides less jitter than seq is not a theoretical problem but a known fact (see e g [1]), so I haven't tried to "prove" it myself. And I cannot read your mind well enough to know what you would consider a sufficient proof - are you expecting to see differences on a default or RT kernel, on a Threadripper or a Beaglebone, idle system or while running RT torture tests? Etc.
There is certainly difference, and it might be interesting to see the dependency on the hardware or on the configuration. But, again, my primary question is: have you measured how *your patch* really provides the improvement? If yes, please show the numbers in the patch description.
As you requested, I have now performed such testing.
Results:
Seq - idle: 5.0 ms
Seq - hackbench: 1.3 s (yes, above one second)
Raw + framing - idle: 2.8 ms
Raw + framing - hackbench: 2.8 ms
Setup / test description:
I had an external midi sequencer connected through USB. The system under test was a Celeron N3150 with internal graphics. The sequencer was set to generate note on/note off commands exactly 10 times per second.
For the seq tests I used "arecordmidi" and analyzed the delta values of resulting midi file. For the raw + framing tests I used a home-made application to write a midi file. The monotonic clock option was used to rule out differences between monotonic and monotonic_raw. The result shown above is the maximum amount of delta value, converted to milliseconds, minus the expected 100 ms between notes. Each test was run for a minute or two.
For the "idle" test, the machine was idle (running a normal desktop), and for the "hackbench" test, "chrt -r 10 hackbench" was run a few times in parallel with the midi recording application (which was run with "chrt -r 15").
I also tried a few other stress tests but hackbench was the one that stood out as totally destroying the timestamps of seq midi. (E g, running "rt-migrate-test" in parallel with "arecordmidi" gave a max jitter value of 13 ms.)
Conclusion:
I still believe the proposed raw + framing mode is a valuable improvement in the normal/idle case, but even more so because it is more stable in stressed conditions. Do you agree?
Thanks for the tests. Yes, that's an interesting and convincing result.
Could you do a couple of favors in addition?
1) Check the other workqueue
It's interesting to see whether the hiprio system workqueue may give a better latency. A oneliner patch is like below.
-- 8< -- --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -1028,7 +1028,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, } if (result > 0) { if (runtime->event) - schedule_work(&runtime->event_work); + queue_work(system_highpri_wq, &runtime->event_work); else if (__snd_rawmidi_ready(runtime)) wake_up(&runtime->sleep); } -- 8< --
Also, system_unbound_wq can be another interesting test case instead of system_highpri_wq.
2) Direct sequencer event process
If a chance of workqueue doesn't give significant improvement, we might need to check the direct invocation of the sequencer dispatcher. A totally untested patch is like below.
-- 8< -- --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -979,6 +979,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, unsigned long flags; int result = 0, count1; struct snd_rawmidi_runtime *runtime = substream->runtime; + bool call_event = false;
if (!substream->opened) return -EBADFD; @@ -1028,11 +1029,13 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, } if (result > 0) { if (runtime->event) - schedule_work(&runtime->event_work); + call_event = true; else if (__snd_rawmidi_ready(runtime)) wake_up(&runtime->sleep); } spin_unlock_irqrestore(&runtime->lock, flags); + if (call_event) + runtime->event(runtime->substream); return result; } EXPORT_SYMBOL(snd_rawmidi_receive);
-- 8< --
In theory, this should bring to the same level of latency as the rawmidi timestamping. Of course, this doesn't mean we can go straight to this way, but it's some material for consideration.
Takashi
On 2021-04-06 14:01, Takashi Iwai wrote:
On Mon, 05 Apr 2021 14:13:27 +0200, David Henningsson wrote:
On 2021-03-31 09:40, Takashi Iwai wrote:
On Tue, 30 Mar 2021 21:35:11 +0200, David Henningsson wrote:
Well, I believe that rawmidi provides less jitter than seq is not a theoretical problem but a known fact (see e g [1]), so I haven't tried to "prove" it myself. And I cannot read your mind well enough to know what you would consider a sufficient proof - are you expecting to see differences on a default or RT kernel, on a Threadripper or a Beaglebone, idle system or while running RT torture tests? Etc.
There is certainly difference, and it might be interesting to see the dependency on the hardware or on the configuration. But, again, my primary question is: have you measured how *your patch* really provides the improvement? If yes, please show the numbers in the patch description.
As you requested, I have now performed such testing.
Results:
Seq - idle: 5.0 ms
Seq - hackbench: 1.3 s (yes, above one second)
Raw + framing - idle: 2.8 ms
Raw + framing - hackbench: 2.8 ms
Setup / test description:
I had an external midi sequencer connected through USB. The system under test was a Celeron N3150 with internal graphics. The sequencer was set to generate note on/note off commands exactly 10 times per second.
For the seq tests I used "arecordmidi" and analyzed the delta values of resulting midi file. For the raw + framing tests I used a home-made application to write a midi file. The monotonic clock option was used to rule out differences between monotonic and monotonic_raw. The result shown above is the maximum amount of delta value, converted to milliseconds, minus the expected 100 ms between notes. Each test was run for a minute or two.
For the "idle" test, the machine was idle (running a normal desktop), and for the "hackbench" test, "chrt -r 10 hackbench" was run a few times in parallel with the midi recording application (which was run with "chrt -r 15").
I also tried a few other stress tests but hackbench was the one that stood out as totally destroying the timestamps of seq midi. (E g, running "rt-migrate-test" in parallel with "arecordmidi" gave a max jitter value of 13 ms.)
Conclusion:
I still believe the proposed raw + framing mode is a valuable improvement in the normal/idle case, but even more so because it is more stable in stressed conditions. Do you agree?
Thanks for the tests. Yes, that's an interesting and convincing result.
Could you do a couple of favors in addition?
Okay, now done. Enjoy :-)
- Check the other workqueue
It's interesting to see whether the hiprio system workqueue may give a better latency. A oneliner patch is like below.
-- 8< -- --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -1028,7 +1028,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, } if (result > 0) { if (runtime->event)
schedule_work(&runtime->event_work);
else if (__snd_rawmidi_ready(runtime)) wake_up(&runtime->sleep); }queue_work(system_highpri_wq, &runtime->event_work);
-- 8< --
Result: idle: 5.0 ms
hackbench > 1 s
I e, same as original.
Also, system_unbound_wq can be another interesting test case instead of system_highpri_wq.
- Direct sequencer event process
If a chance of workqueue doesn't give significant improvement, we might need to check the direct invocation of the sequencer dispatcher. A totally untested patch is like below.
-- 8< -- --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -979,6 +979,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, unsigned long flags; int result = 0, count1; struct snd_rawmidi_runtime *runtime = substream->runtime;
bool call_event = false;
if (!substream->opened) return -EBADFD;
@@ -1028,11 +1029,13 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, } if (result > 0) { if (runtime->event)
schedule_work(&runtime->event_work);
else if (__snd_rawmidi_ready(runtime)) wake_up(&runtime->sleep); } spin_unlock_irqrestore(&runtime->lock, flags);call_event = true;
- if (call_event)
return result; } EXPORT_SYMBOL(snd_rawmidi_receive);runtime->event(runtime->substream);
-- 8< --
Result:
Idle: 3.0 ms
Hackbench still > 1s.
The reason that this is 3.0 and not 2.8 is probably during to some rounding to whole ms somewhere in either seq or arecordmidi - I'd say this is likely the same 2.8 ms as we see from the rawmidi+framing test.
In theory, this should bring to the same level of latency as the rawmidi timestamping. Of course, this doesn't mean we can go straight to this way, but it's some material for consideration.
I don't know why the hackbench test is not improved here. But you seem to have changed seq from tasklet to workqueue in 2011 (commit b3c705aa9e9), presumably for some relevant reason, like a need to sleep in the seq code...?
// David
On Sat, 10 Apr 2021 13:41:38 +0200, David Henningsson wrote:
On 2021-04-06 14:01, Takashi Iwai wrote:
On Mon, 05 Apr 2021 14:13:27 +0200, David Henningsson wrote:
On 2021-03-31 09:40, Takashi Iwai wrote:
On Tue, 30 Mar 2021 21:35:11 +0200, David Henningsson wrote:
Well, I believe that rawmidi provides less jitter than seq is not a theoretical problem but a known fact (see e g [1]), so I haven't tried to "prove" it myself. And I cannot read your mind well enough to know what you would consider a sufficient proof - are you expecting to see differences on a default or RT kernel, on a Threadripper or a Beaglebone, idle system or while running RT torture tests? Etc.
There is certainly difference, and it might be interesting to see the dependency on the hardware or on the configuration. But, again, my primary question is: have you measured how *your patch* really provides the improvement? If yes, please show the numbers in the patch description.
As you requested, I have now performed such testing.
Results:
Seq - idle: 5.0 ms
Seq - hackbench: 1.3 s (yes, above one second)
Raw + framing - idle: 2.8 ms
Raw + framing - hackbench: 2.8 ms
Setup / test description:
I had an external midi sequencer connected through USB. The system under test was a Celeron N3150 with internal graphics. The sequencer was set to generate note on/note off commands exactly 10 times per second.
For the seq tests I used "arecordmidi" and analyzed the delta values of resulting midi file. For the raw + framing tests I used a home-made application to write a midi file. The monotonic clock option was used to rule out differences between monotonic and monotonic_raw. The result shown above is the maximum amount of delta value, converted to milliseconds, minus the expected 100 ms between notes. Each test was run for a minute or two.
For the "idle" test, the machine was idle (running a normal desktop), and for the "hackbench" test, "chrt -r 10 hackbench" was run a few times in parallel with the midi recording application (which was run with "chrt -r 15").
I also tried a few other stress tests but hackbench was the one that stood out as totally destroying the timestamps of seq midi. (E g, running "rt-migrate-test" in parallel with "arecordmidi" gave a max jitter value of 13 ms.)
Conclusion:
I still believe the proposed raw + framing mode is a valuable improvement in the normal/idle case, but even more so because it is more stable in stressed conditions. Do you agree?
Thanks for the tests. Yes, that's an interesting and convincing result. Could you do a couple of favors in addition?
Okay, now done. Enjoy :-)
- Check the other workqueue
It's interesting to see whether the hiprio system workqueue may give a better latency. A oneliner patch is like below.
-- 8< -- --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -1028,7 +1028,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, } if (result > 0) { if (runtime->event)
schedule_work(&runtime->event_work);
else if (__snd_rawmidi_ready(runtime)) wake_up(&runtime->sleep); }queue_work(system_highpri_wq, &runtime->event_work);
-- 8< --
Result: idle: 5.0 ms
hackbench > 1 s
I e, same as original.
Hm, that's disappointing. I hoped that the influence of the workqueue would be visible, but apparently not.
But more concern is about the result with the second patch.
Also, system_unbound_wq can be another interesting test case instead of system_highpri_wq.
- Direct sequencer event process
If a chance of workqueue doesn't give significant improvement, we might need to check the direct invocation of the sequencer dispatcher. A totally untested patch is like below.
-- 8< -- --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -979,6 +979,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, unsigned long flags; int result = 0, count1; struct snd_rawmidi_runtime *runtime = substream->runtime;
- bool call_event = false; if (!substream->opened) return -EBADFD;
@@ -1028,11 +1029,13 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, } if (result > 0) { if (runtime->event)
schedule_work(&runtime->event_work);
else if (__snd_rawmidi_ready(runtime)) wake_up(&runtime->sleep); } spin_unlock_irqrestore(&runtime->lock, flags);call_event = true;
- if (call_event)
return result; } EXPORT_SYMBOL(snd_rawmidi_receive);runtime->event(runtime->substream);
-- 8< --
Result:
Idle: 3.0 ms
Hackbench still > 1s.
The reason that this is 3.0 and not 2.8 is probably during to some rounding to whole ms somewhere in either seq or arecordmidi - I'd say this is likely the same 2.8 ms as we see from the rawmidi+framing test.
In theory, this should bring to the same level of latency as the rawmidi timestamping. Of course, this doesn't mean we can go straight to this way, but it's some material for consideration.
I don't know why the hackbench test is not improved here. But you seem to have changed seq from tasklet to workqueue in 2011 (commit b3c705aa9e9), presumably for some relevant reason, like a need to sleep in the seq code...?
No, the commit you mentioned didn't change the behavior. It's a simple replacement from a tasklet to a work (at least for the input direction). So, the irq handler processes and delivers the event directly, and since it's an irq context, there can be no sleep. At most, it's a spinlock and preemption, but I don't think this could give such a long delay inside the irq handler.
Might it be something else; e.g. the timestamp gets updated later again in a different place. In anyway, this needs more investigation, apart from the merge of the rawmidi frame mode support.
thanks,
Takashi
Hi,
From Takashi (not maintainer),
On Fri, Mar 26, 2021 at 05:29:04PM +0100, David Henningsson wrote:
On 2021-03-26 08:55, Takashi Iwai wrote:
On Fri, 26 Mar 2021 05:46:15 +0100, Takashi Sakamoto wrote:
On Wed, Mar 24, 2021 at 04:57:31PM +0100, David Henningsson wrote:
However, the timing jitter of IRQ handler invocation is issued in this case, as well as PCM interface, even if the data rate of MIDI physical layer is quite low nowadays (31.25 Kbit / sec ~= 3906.25 byte / sec). As long as I experienced, in actual running Linux system, the invocation of IRQ handler has no guarantee for timing jitter, mainly due to CPU level IRQ mask (like spin_lock). Therefore the interval of each invocation is not so precise to decide event timestamp, at least for time slot comes from MIDI physical layer.
Nevertheless, I think your idea is enough interesting, with conditions to deliver information from driver (or driver developer) to applications (ALSA Sequencer or userspace applications). Even if we have some limitation and restriction to precise timestamp, it's worth to work for it. It seems to be required that improvements at interface level and documentation about how to use the frame timestamp you implemented.
Right, so first, I believe the most common way to transport midi these days is through USB, where the 31.25 Kbit/sec limit does not apply: instead we have a granularity of 1/8 ms but many messages can fit in each one. So that limit is - for many if not most cases - gone.
Second; you're probably right in that there is still some jitter w r t when the IRQ fires. That is regrettable, but the earlier we get that timestamp the better, so a timestamp in IRQ context would at least be better than in a workqueue that fires after the IRQ, or in userspace that possibly has scheduling delays.
Btw, I checked the "struct urb" and there was no timestamp in there, so I don't know how to get a better timestamp than in snd_rawmidi_receive. But maybe other interfaces (PCI, Firewire etc) offers something better.
Hm. Regardless of type of hardware; e.g. OHCI/EHCI/xHCI, 1394 OHCI, or PCI-e extension card, for software to process sampled data, it's always issue that the jitter between triggering IRQ (hardware side) and invoking IRQ handler (processor side). As an actual example, let me share my experience in 1394 OHCI.
1394 OHCI controller is governed by 24.576 Mhz clock (or double depending on vendors). In IEEE 1394, ishcornous service is 8,000 times per second. 1394 OHCI specification allows software to schedule hardware IRQ per isochronous cycle.
Once ALSA firewire stack is programmed to schedule the hardwar IRQ evenry 16 isochronous cycle. The corresponding IRQ handler is expected to invoke every 2 milli second. As long as I tested in usual desktop environment[2], the jitter is between 150 micro second and 4.7 milli second. In the worst case, it's 6.0 milli seconds. The above is also the same wnen using 'threadirqs'.
When using 'isolcpus' and arranging 'smp_affinity' to reduce load from one of processor core to invoke the IRQ handler, the interval is 2 milli second with +- several nano seconds, therefore the 1394 OHCI controller can trigger hardware IRQ so precise. The jitter comes from processor side. I think many running contexts on the same processor core masks IRQ so often and the jitter is not deterministic.
Here, what I'd like to tell you is that we can not handle the system time as is as timestamp of received MIDI messages, as long as relying on IRQ context. We need some arrangements to construct better timestamp with some compensations. In this point, the 3rd version of patch is not enough[3], in my opinion.
Interesting measurements. While several ms of jitter is not ideal, I have a few counter arguments why I still believe we should merge this patch:
1) I don't think we should let the perfect be the enemy of the good here, just because we cannot eliminate *all* jitter does not mean we should not try to eliminate as much jitter as we can.
Exposing something to userspace is easy engineering, but find appropriate role to it is not as easy for userspace applications.
2) an unprivileged process (that cannot get RT_PRIO scheduling) might have a wakeup jitter of a lot more than a few ms, so for that type of process it would be a big improvement. And sometimes even RT_PRIO processes have big delays due to preempt_disable etc.
I agree with the point. When comparing to current implementation of ALSA rawmidi stack, the patch makes it near true time somwhow to estimate read MIDI messages since task scheduling includes larger resolution and jitter than the ones of invocation of IRQ handler.
I reported some issues as a next step. At least, the system time to invoke IRQ handler needs to be compensated with supplemental information. Current patch forces the task to userspace applications silently without any information from kernel space. This results in failure to set incentive for userspace developers to use the new functionality, in my opinion.
I guess it better to deliver below information:
* interval of burstness * drivers for packet-based protocol handles several packets at once * minimal timestamp gap between successive frames * however actually fuzzed by the jitter of invocation of IRQ handler * maximum size of frame * the target hardware transfer batch of MIDI messages or single midi byte at once
The above items is an idea from my side. I need more time and discussion for better set.
Again, I have no opposition to the idea itself. However in a view of application developer, it's not necessarily useful against timeslot of 3906.25 bytes/sec.
3) The jitter will depend on the hardware, and other hardware might have better (or worse) IRQ jitter.
4) If this patch gets accepted, it might show other kernel developers that IRQ jitter matters to us, and that in turn might improve the chances of getting IRQ jitter improvement patches in, in case someone else wants to help solving that problem.
Thanks
Takashi Sakamoto
Hi,
On Fri, Mar 26, 2021 at 08:55:38AM +0100, Takashi Iwai wrote:
On Fri, 26 Mar 2021 05:46:15 +0100, Takashi Sakamoto wrote:
On Wed, Mar 24, 2021 at 04:57:31PM +0100, David Henningsson wrote:
However, the timing jitter of IRQ handler invocation is issued in this case, as well as PCM interface, even if the data rate of MIDI physical layer is quite low nowadays (31.25 Kbit / sec ~= 3906.25 byte / sec). As long as I experienced, in actual running Linux system, the invocation of IRQ handler has no guarantee for timing jitter, mainly due to CPU level IRQ mask (like spin_lock). Therefore the interval of each invocation is not so precise to decide event timestamp, at least for time slot comes from MIDI physical layer.
Nevertheless, I think your idea is enough interesting, with conditions to deliver information from driver (or driver developer) to applications (ALSA Sequencer or userspace applications). Even if we have some limitation and restriction to precise timestamp, it's worth to work for it. It seems to be required that improvements at interface level and documentation about how to use the frame timestamp you implemented.
Right, so first, I believe the most common way to transport midi these days is through USB, where the 31.25 Kbit/sec limit does not apply: instead we have a granularity of 1/8 ms but many messages can fit in each one. So that limit is - for many if not most cases - gone.
Second; you're probably right in that there is still some jitter w r t when the IRQ fires. That is regrettable, but the earlier we get that timestamp the better, so a timestamp in IRQ context would at least be better than in a workqueue that fires after the IRQ, or in userspace that possibly has scheduling delays.
Btw, I checked the "struct urb" and there was no timestamp in there, so I don't know how to get a better timestamp than in snd_rawmidi_receive. But maybe other interfaces (PCI, Firewire etc) offers something better.
Hm. Regardless of type of hardware; e.g. OHCI/EHCI/xHCI, 1394 OHCI, or PCI-e extension card, for software to process sampled data, it's always issue that the jitter between triggering IRQ (hardware side) and invoking IRQ handler (processor side). As an actual example, let me share my experience in 1394 OHCI.
1394 OHCI controller is governed by 24.576 Mhz clock (or double depending on vendors). In IEEE 1394, ishcornous service is 8,000 times per second. 1394 OHCI specification allows software to schedule hardware IRQ per isochronous cycle.
Once ALSA firewire stack is programmed to schedule the hardwar IRQ evenry 16 isochronous cycle. The corresponding IRQ handler is expected to invoke every 2 milli second. As long as I tested in usual desktop environment[2], the jitter is between 150 micro second and 4.7 milli second. In the worst case, it's 6.0 milli seconds. The above is also the same wnen using 'threadirqs'.
When using 'isolcpus' and arranging 'smp_affinity' to reduce load from one of processor core to invoke the IRQ handler, the interval is 2 milli second with +- several nano seconds, therefore the 1394 OHCI controller can trigger hardware IRQ so precise. The jitter comes from processor side. I think many running contexts on the same processor core masks IRQ so often and the jitter is not deterministic.
Here, what I'd like to tell you is that we can not handle the system time as is as timestamp of received MIDI messages, as long as relying on IRQ context. We need some arrangements to construct better timestamp with some compensations. In this point, the 3rd version of patch is not enough[3], in my opinion.
My intension is not to discourage you, but it's better to have more care. As one of the care, I think we can use extension of 'struct snd_rawmidi_status' to deliver some useful information to ALSA rawmidi applications, or including other parameters to the frame structure. But I don't have ideas about what information should be delivered, sorry...
Well, the question is how much accuracy is wanted, and it's relatively low for MIDI -- at least v1, which was defined many decades ago for a slow serial line.
That said, the patch set was designed for providing the best-effort timestamping in software, and that's supposed to be enough for normal use cases.
Indeed. I think it the best-effort in software side, but with less care of hardware matters. Although I have no strong objection to the idea of the patch itself, the point of my insistent is _not_ how much accuracy is preferable.
When I imagine to write actual program as an application of rawmidi with the frame structure, I have a concern about how to use the timestamp. At present, it's just a record of invocation of any IRQ context. When using it for timestamp of MIDI message, the application needs supplement information for compensation for any jitter and delay, however nothing is provided.
At present, we seem to fail setting any incentive for users to use the new functionality, in my opinion. Such functionality is going to be obsoleted sooner or later, like 'struct snd_rawmidi_status.tstamp' (I guess no one could have implemented it for the original purpose, then no one is going to use it). That's my concern.
If there is any device that is with the hardware timestamping, in theory, we could provide the similar data stream in the same format but maybe with a different timestamp type.
I seem to fail getting what you mean, but if I'm allowed to mention about legacy devices, I'll report the case of IEC 61883-1/6 in IEEE 1394 with 1394 OHCI. 1394 OHCI allows software to handle isochronous packet payload with cycle time. We can compute the timestamp independent on current system time in IRQ handler with resolution by 125 micro second. Below drivers can adapt to it:
* snd-bebob * snd-fireworks * snd-oxfw * snd-dice * snd-firewire-digi00x (mutated version of protocol) * snd-firewire-motu (mutated version of protocol)
But for the case below MIDI messages are on asynchronous packet and need to current system time in invoked IRQ handler: * snd-tascam * snd-fireface
But actually I'd like to see some measurement how much we can improve the timestamp accuracy by shifting the post office. This may show interesting numbers.
Also, one thing to be more considered is the support for MIDI v2 in future. I haven't seen any development so far (and no device available around), so I cannot comment on this much more, but it'd be worth to take a quick look before defining the solid API/ABI.
Regards
Takashi Sakamoto
participants (3)
-
David Henningsson
-
Takashi Iwai
-
Takashi Sakamoto