Re: [PATCH v2] sound: rawmidi: Add framing mode

12 Apr 2021

On Sat, 10 Apr 2021 13:41:38 +0200,
David Henningsson wrote:
...
On 2021-04-06 14:01, Takashi Iwai wrote:
...
On Mon, 05 Apr 2021 14:13:27 +0200,
David Henningsson wrote:
...
On 2021-03-31 09:40, Takashi Iwai wrote:
...
On Tue, 30 Mar 2021 21:35:11 +0200,
David Henningsson wrote:
...
Well, I believe that rawmidi provides less jitter than seq is not a
theoretical problem but a known fact (see e g [1]), so I haven't tried
to "prove" it myself. And I cannot read your mind well enough to know
what you would consider a sufficient proof - are you expecting to see
differences on a default or RT kernel, on a Threadripper or a
Beaglebone, idle system or while running RT torture tests? Etc.
There is certainly difference, and it might be interesting to see the
dependency on the hardware or on the configuration.  But, again, my
primary question is: have you measured how *your patch* really
provides the improvement?  If yes, please show the numbers in the
patch description.
As you requested, I have now performed such testing.
Results:
Seq - idle: 5.0 ms
Seq - hackbench: 1.3 s (yes, above one second)
Raw + framing - idle: 2.8 ms
Raw + framing - hackbench: 2.8 ms
Setup / test description:
I had an external midi sequencer connected through USB. The system
under test was a Celeron N3150 with internal graphics. The sequencer
was set to generate note on/note off commands exactly 10 times per
second.
For the seq tests I used "arecordmidi" and analyzed the delta values
of resulting midi file. For the raw + framing tests I used a home-made
application to write a midi file. The monotonic clock option was used
to rule out differences between monotonic and monotonic_raw. The
result shown above is the maximum amount of delta value, converted to
milliseconds, minus the expected 100 ms between notes. Each test was
run for a minute or two.
For the "idle" test, the machine was idle (running a normal desktop),
and for the "hackbench" test, "chrt -r 10 hackbench" was run a few
times in parallel with the midi recording application (which was run
with "chrt -r 15").
I also tried a few other stress tests but hackbench was the one that
stood out as totally destroying the timestamps of seq midi. (E g,
running "rt-migrate-test" in parallel with "arecordmidi" gave a max
jitter value of 13 ms.)
Conclusion:
I still believe the proposed raw + framing mode is a valuable
improvement in the normal/idle case, but even more so because it is
more stable in stressed conditions. Do you agree?
Thanks for the tests.  Yes, that's an interesting and convincing
result.
        Could you do a couple of favors in addition?
Okay, now done. Enjoy :-)
...

Check the other workqueue

It's interesting to see whether the hiprio system workqueue may give a
better latency.  A oneliner patch is like below.
-- 8< --

--- a/sound/core/rawmidi.c
+++ b/sound/core/rawmidi.c
@@ -1028,7 +1028,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream,
   }
   if (result > 0) {
   	if (runtime->event)

	schedule_work(&runtime->event_work);




	queue_work(system_highpri_wq, &runtime->event_work);

else if (__snd_rawmidi_ready(runtime))
	wake_up(&runtime->sleep);
}

-- 8< --
Result: idle: 5.0 ms
hackbench > 1 s
I e, same as original.
Hm, that's disappointing.  I hoped that the influence of the workqueue
would be visible, but apparently not.
But more concern is about the result with the second patch.
...
...
Also, system_unbound_wq can be another interesting test case instead
of system_highpri_wq.

Direct sequencer event process

If a chance of workqueue doesn't give significant improvement, we
might need to check the direct invocation of the sequencer
dispatcher.  A totally untested patch is like below.
-- 8< --
--- a/sound/core/rawmidi.c
+++ b/sound/core/rawmidi.c
@@ -979,6 +979,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream,
   unsigned long flags;
   int result = 0, count1;
   struct snd_rawmidi_runtime *runtime = substream->runtime;

bool call_event = false;
	if (!substream->opened)
return -EBADFD;

@@ -1028,11 +1029,13 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream,
   }
   if (result > 0) {
   	if (runtime->event)

	schedule_work(&runtime->event_work);




	call_event = true;

else if (__snd_rawmidi_ready(runtime))
	wake_up(&runtime->sleep);
}
spin_unlock_irqrestore(&runtime->lock, flags);
if (call_event)
runtime->event(runtime->substream);

return result;
}
EXPORT_SYMBOL(snd_rawmidi_receive);

-- 8< --
Result:
Idle: 3.0 ms
Hackbench still > 1s.
The reason that this is 3.0 and not 2.8 is probably during to some
rounding to whole ms somewhere in either seq or arecordmidi - I'd say
this is likely the same 2.8 ms as we see from the rawmidi+framing
test.
...
In theory, this should bring to the same level of latency as the
rawmidi timestamping.  Of course, this doesn't mean we can go straight
to this way, but it's some material for consideration.
I don't know why the hackbench test is not improved here. But you seem
to have changed seq from tasklet to workqueue in 2011 (commit
b3c705aa9e9), presumably for some relevant reason, like a need to
sleep in the seq code...?
No, the commit you mentioned didn't change the behavior.  It's a
simple replacement from a tasklet to a work (at least for the input
direction).  So, the irq handler processes and delivers the event
directly, and since it's an irq context, there can be no sleep.
At most, it's a spinlock and preemption, but I don't think this could
give such a long delay inside the irq handler.
Might it be something else; e.g. the timestamp gets updated later
again in a different place.  In anyway, this needs more
investigation, apart from the merge of the rawmidi frame mode
support.
thanks,
Takashi