On Sat, 10 Apr 2021 13:41:38 +0200, David Henningsson wrote:
On 2021-04-06 14:01, Takashi Iwai wrote:
On Mon, 05 Apr 2021 14:13:27 +0200, David Henningsson wrote:
On 2021-03-31 09:40, Takashi Iwai wrote:
On Tue, 30 Mar 2021 21:35:11 +0200, David Henningsson wrote:
Well, I believe that rawmidi provides less jitter than seq is not a theoretical problem but a known fact (see e g [1]), so I haven't tried to "prove" it myself. And I cannot read your mind well enough to know what you would consider a sufficient proof - are you expecting to see differences on a default or RT kernel, on a Threadripper or a Beaglebone, idle system or while running RT torture tests? Etc.
There is certainly difference, and it might be interesting to see the dependency on the hardware or on the configuration. But, again, my primary question is: have you measured how *your patch* really provides the improvement? If yes, please show the numbers in the patch description.
As you requested, I have now performed such testing.
Results:
Seq - idle: 5.0 ms
Seq - hackbench: 1.3 s (yes, above one second)
Raw + framing - idle: 2.8 ms
Raw + framing - hackbench: 2.8 ms
Setup / test description:
I had an external midi sequencer connected through USB. The system under test was a Celeron N3150 with internal graphics. The sequencer was set to generate note on/note off commands exactly 10 times per second.
For the seq tests I used "arecordmidi" and analyzed the delta values of resulting midi file. For the raw + framing tests I used a home-made application to write a midi file. The monotonic clock option was used to rule out differences between monotonic and monotonic_raw. The result shown above is the maximum amount of delta value, converted to milliseconds, minus the expected 100 ms between notes. Each test was run for a minute or two.
For the "idle" test, the machine was idle (running a normal desktop), and for the "hackbench" test, "chrt -r 10 hackbench" was run a few times in parallel with the midi recording application (which was run with "chrt -r 15").
I also tried a few other stress tests but hackbench was the one that stood out as totally destroying the timestamps of seq midi. (E g, running "rt-migrate-test" in parallel with "arecordmidi" gave a max jitter value of 13 ms.)
Conclusion:
I still believe the proposed raw + framing mode is a valuable improvement in the normal/idle case, but even more so because it is more stable in stressed conditions. Do you agree?
Thanks for the tests. Yes, that's an interesting and convincing result. Could you do a couple of favors in addition?
Okay, now done. Enjoy :-)
- Check the other workqueue
It's interesting to see whether the hiprio system workqueue may give a better latency. A oneliner patch is like below.
-- 8< -- --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -1028,7 +1028,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, } if (result > 0) { if (runtime->event)
schedule_work(&runtime->event_work);
else if (__snd_rawmidi_ready(runtime)) wake_up(&runtime->sleep); }queue_work(system_highpri_wq, &runtime->event_work);
-- 8< --
Result: idle: 5.0 ms
hackbench > 1 s
I e, same as original.
Hm, that's disappointing. I hoped that the influence of the workqueue would be visible, but apparently not.
But more concern is about the result with the second patch.
Also, system_unbound_wq can be another interesting test case instead of system_highpri_wq.
- Direct sequencer event process
If a chance of workqueue doesn't give significant improvement, we might need to check the direct invocation of the sequencer dispatcher. A totally untested patch is like below.
-- 8< -- --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -979,6 +979,7 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, unsigned long flags; int result = 0, count1; struct snd_rawmidi_runtime *runtime = substream->runtime;
- bool call_event = false; if (!substream->opened) return -EBADFD;
@@ -1028,11 +1029,13 @@ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, } if (result > 0) { if (runtime->event)
schedule_work(&runtime->event_work);
else if (__snd_rawmidi_ready(runtime)) wake_up(&runtime->sleep); } spin_unlock_irqrestore(&runtime->lock, flags);call_event = true;
- if (call_event)
return result; } EXPORT_SYMBOL(snd_rawmidi_receive);runtime->event(runtime->substream);
-- 8< --
Result:
Idle: 3.0 ms
Hackbench still > 1s.
The reason that this is 3.0 and not 2.8 is probably during to some rounding to whole ms somewhere in either seq or arecordmidi - I'd say this is likely the same 2.8 ms as we see from the rawmidi+framing test.
In theory, this should bring to the same level of latency as the rawmidi timestamping. Of course, this doesn't mean we can go straight to this way, but it's some material for consideration.
I don't know why the hackbench test is not improved here. But you seem to have changed seq from tasklet to workqueue in 2011 (commit b3c705aa9e9), presumably for some relevant reason, like a need to sleep in the seq code...?
No, the commit you mentioned didn't change the behavior. It's a simple replacement from a tasklet to a work (at least for the input direction). So, the irq handler processes and delivers the event directly, and since it's an irq context, there can be no sleep. At most, it's a spinlock and preemption, but I don't think this could give such a long delay inside the irq handler.
Might it be something else; e.g. the timestamp gets updated later again in a different place. In anyway, this needs more investigation, apart from the merge of the rawmidi frame mode support.
thanks,
Takashi