pcm_meter.c issue at s16_update
Hi,
I am debugging the following problem with the PCM METER API:
An application with meter/scope configured in .asoundrc hits a high CPU load sometimes when a new playback stream is opened. E.g. in MPD when opening a new radio stream. The 100% CPU core load takes tens of seconds.
Debugging the issue revealed this:
TL;DR: at decrease of [status.appl_ptr - status.delay] between consequent runs the size of buffer to convert by the built-in s16 scope is set from usual 100s to huge value of pcm->boundary (1.5G), causing a very long processing at 100% core load.
Details:
Every scope setup uses a default s16 scope defined in alsa-lib, which converts samples to s16 so that the user-provided scopes can access data via snd_pcm_scope_s16_get_channel_buffer buffers with converted samples.
The s16 scope converts samples in a loop, in frames chunks, decrementing size variable by frames until zero. The value of size is hundreds samples max at each s16_update call, calculated in https://github.com/alsa-project/alsa-lib/blob/master/src/pcm/pcm_meter.c#L11... . A debug added after that line shows this:
@@ -1098,14 +1103,18 @@ static void s16_update(snd_pcm_scope_t *scope) snd_pcm_sframes_t size; snd_pcm_uframes_t offset; size = meter->now - s16->old; + fprintf(stderr, "s16_update 1: meter->now %ld, s16->old %ld, size %ld\n", meter->now, s16->old, size); if (size < 0) size += spcm->boundary; offset = s16->old % meter->buf_size;
s16_update 1: meter->now 2154673, s16->old 2153771, size 902 s16_update 1: meter->now 2155579, s16->old 2154673, size 906 s16_update 1: meter->now 2156487, s16->old 2155579, size 908 s16_update 1: meter->now 2157391, s16->old 2156487, size 904 s16_update 1: meter->now 2158298, s16->old 2157391, size 907 s16_update 1: meter->now 2159203, s16->old 2158298, size 905 s16_update 1: meter->now 2160091, s16->old 2159203, size 888 s16_update 1: meter->now 2160976, s16->old 2160091, size 885 s16_update 1: meter->now 2161924, s16->old 2160976, size 948 s16_update 1: meter->now 2162829, s16->old 2161924, size 905 s16_update 1: meter->now 2163733, s16->old 2162829, size 904 s16_update 1: meter->now 2164594, s16->old 2163733, size 861 s16_update 1: meter->now 2165542, s16->old 2164594, size 948 s16_update 1: meter->now 2166448, s16->old 2165542, size 906 s16_update 1: meter->now 2167352, s16->old 2166448, size 904 s16_update 1: meter->now 2168259, s16->old 2167352, size 907 s16_update 1: meter->now 2169165, s16->old 2168259, size 906 s16_update 1: meter->now 2170050, s16->old 2169165, size 885 s16_update 1: meter->now 2170936, s16->old 2170050, size 886 s16_update 1: meter->now 2171877, s16->old 2170936, size 941 s16_update 1: meter->now 2172783, s16->old 2171877, size 906 s16_update 1: meter->now 2173690, s16->old 2172783, size 907 s16_update 1: meter->now 2174596, s16->old 2173690, size 906 s16_update 1: meter->now 2175502, s16->old 2174596, size 906 s16_update 1: meter->now 2176407, s16->old 2175502, size 905 s16_update 1: meter->now 2177311, s16->old 2176407, size 904 s16_update 1: meter->now 2178215, s16->old 2177311, size 904
However, sometimes meter->now < s16->old, causing size < 0, for which this line applies https://github.com/alsa-project/alsa-lib/blob/master/src/pcm/pcm_meter.c#L11...
size += spcm->boundary
and size becomes a huge number, because hw:0 pcm -> boundary is 1445068800. The loop at https://github.com/alsa-project/alsa-lib/blob/master/src/pcm/pcm_meter.c#L11... takes tens of seconds, untill the huge buffer is converted to s16.
Usually the negative difference is quite small:
s16_update 1: meter->now 988, s16->old 21139, size -20151 s16_update 1: meter->now 870, s16->old 18865, size -17995 s16_update 1: meter->now 1065, s16->old 20786, size -19721 s16_update 1: meter->now 701, s16->old 24839, size -24138 s16_update 1: meter->now 782, s16->old 869, size -87 s16_update 1: meter->now 763, s16->old 841, size -78 s16_update 1: meter->now 963, s16->old 22525, size -21562 s16_update 1: meter->now 665, s16->old 832, size -167 s16_update 1: meter->now 745, s16->old 865, size -120 s16_update 1: meter->now 859, s16->old 1002, size -143 s16_update 1: meter->now 935, s16->old 117452, size -116517
Yet in all these cases the size variable is incremented by 1445068800, which clearly does not make sense.
Now the question is how it happens that meter->now < s16->old, when meter->now is calculated by now = status.appl_ptr - status.delay https://github.com/alsa-project/alsa-lib/blob/master/src/pcm/pcm_meter.c#L21... and s16->old is set to meter->now at the end of each s16_update call https://github.com/alsa-project/alsa-lib/blob/master/src/pcm/pcm_meter.c#L11... .
The debug around the event looks like this, suddenly meter->now drops down:
s16_update 1: meter->now 2567498, s16->old 2566593, size 905 s16_update 1: meter->now 2568401, s16->old 2567498, size 903 s16_update 1: meter->now 20786, s16->old 20786, size 0 s16_update 1: meter->now 1065, s16->old 20786, size -19721 s16_update 1: meter->now 24839, s16->old 24838, size 1 s16_update 1: meter->now 701, s16->old 24839, size -24138 s16_update 1: meter->now 1253162, s16->old 701, size 1252461 s16_update 1: meter->now 1255148, s16->old 1253162, size 1986
..........
s16_update 1: meter->now 11136, s16->old 10261, size 875 s16_update 1: meter->now 22525, s16->old 22524, size 1 s16_update 1: meter->now 963, s16->old 22525, size -21562 s16_update 1: meter->now 1270914, s16->old 963, size 1269951 s16_update 1: meter->now 1272917, s16->old 1270914, size 2003
Could it perhaps be an XRUN, causing the difference (status.appl_ptr - status.delay) to actually decrease compared to the previous run? I really do not know how/when such decrease can happen. But if it is a standard situation, than IMO the size of to-be converted buffer by the s16 scope should not be set to the huge pcm->boundary.
Thanks a lot for help.
Best regards,
Pavel.
Dne 26. 07. 20 v 20:20 Pavel Hofman napsal(a):
Hi,
I am debugging the following problem with the PCM METER API:
An application with meter/scope configured in .asoundrc hits a high CPU load sometimes when a new playback stream is opened. E.g. in MPD when opening a new radio stream. The 100% CPU core load takes tens of seconds. ...........
TL;DR: at decrease of [status.appl_ptr - status.delay] between consequent runs the size of buffer to convert by the built-in s16 scope is set from usual 100s to huge value of pcm->boundary (1.5G), causing a very long processing at 100% core load.
The debug around the event looks like this, suddenly meter->now drops down:
s16_update 1: meter->now 2567498, s16->old 2566593, size 905 s16_update 1: meter->now 2568401, s16->old 2567498, size 903 s16_update 1: meter->now 20786, s16->old 20786, size 0 s16_update 1: meter->now 1065, s16->old 20786, size -19721 s16_update 1: meter->now 24839, s16->old 24838, size 1 s16_update 1: meter->now 701, s16->old 24839, size -24138 s16_update 1: meter->now 1253162, s16->old 701, size 1252461 s16_update 1: meter->now 1255148, s16->old 1253162, size 1986
..........
s16_update 1: meter->now 11136, s16->old 10261, size 875 s16_update 1: meter->now 22525, s16->old 22524, size 1 s16_update 1: meter->now 963, s16->old 22525, size -21562 s16_update 1: meter->now 1270914, s16->old 963, size 1269951 s16_update 1: meter->now 1272917, s16->old 1270914, size 2003
I think the problem is that s16->old is not reset when status.appl_ptr is zeroed and starts running again. There is a call
static void s16_reset(snd_pcm_scope_t *scope) { snd_pcm_scope_s16_t *s16 = scope->private_data; snd_pcm_meter_t *meter = s16->pcm->private_data; s16->old = meter->now; }
but I do not know when this method is called and whether the meter->now is already assigned to the newly zeroed status.appl_ptr.
Please at which method should I reset s16->old = 0? * s16_reset * s16_enable * s16_start * snd_pcm_scope_s16_open
Thanks a lot for any help,
Pavel.
On Tue, 28 Jul 2020 18:46:00 +0200, Pavel Hofman wrote:
Dne 26. 07. 20 v 20:20 Pavel Hofman napsal(a):
Hi,
I am debugging the following problem with the PCM METER API:
An application with meter/scope configured in .asoundrc hits a high CPU load sometimes when a new playback stream is opened. E.g. in MPD when opening a new radio stream. The 100% CPU core load takes tens of seconds. ...........
TL;DR: at decrease of [status.appl_ptr - status.delay] between consequent runs the size of buffer to convert by the built-in s16 scope is set from usual 100s to huge value of pcm->boundary (1.5G), causing a very long processing at 100% core load.
The debug around the event looks like this, suddenly meter->now drops down:
s16_update 1: meter->now 2567498, s16->old 2566593, size 905 s16_update 1: meter->now 2568401, s16->old 2567498, size 903 s16_update 1: meter->now 20786, s16->old 20786, size 0 s16_update 1: meter->now 1065, s16->old 20786, size -19721 s16_update 1: meter->now 24839, s16->old 24838, size 1 s16_update 1: meter->now 701, s16->old 24839, size -24138 s16_update 1: meter->now 1253162, s16->old 701, size 1252461 s16_update 1: meter->now 1255148, s16->old 1253162, size 1986
..........
s16_update 1: meter->now 11136, s16->old 10261, size 875 s16_update 1: meter->now 22525, s16->old 22524, size 1 s16_update 1: meter->now 963, s16->old 22525, size -21562 s16_update 1: meter->now 1270914, s16->old 963, size 1269951 s16_update 1: meter->now 1272917, s16->old 1270914, size 2003
I think the problem is that s16->old is not reset when status.appl_ptr is zeroed and starts running again. There is a call
static void s16_reset(snd_pcm_scope_t *scope) { snd_pcm_scope_s16_t *s16 = scope->private_data; snd_pcm_meter_t *meter = s16->pcm->private_data; s16->old = meter->now; }
but I do not know when this method is called and whether the meter->now is already assigned to the newly zeroed status.appl_ptr.
Would adding atomic_add(&meter->reset, 1) in snd_pcm_meter_reset() help?
Takashi
Please at which method should I reset s16->old = 0?
- s16_reset
- s16_enable
- s16_start
- snd_pcm_scope_s16_open
Thanks a lot for any help,
Pavel.
Dne 28. 07. 20 v 19:04 Takashi Iwai napsal(a):
On Tue, 28 Jul 2020 18:46:00 +0200, Pavel Hofman wrote:
The debug around the event looks like this, suddenly meter->now drops down:
s16_update 1: meter->now 2567498, s16->old 2566593, size 905 s16_update 1: meter->now 2568401, s16->old 2567498, size 903 s16_update 1: meter->now 20786, s16->old 20786, size 0 s16_update 1: meter->now 1065, s16->old 20786, size -19721 s16_update 1: meter->now 24839, s16->old 24838, size 1 s16_update 1: meter->now 701, s16->old 24839, size -24138 s16_update 1: meter->now 1253162, s16->old 701, size 1252461 s16_update 1: meter->now 1255148, s16->old 1253162, size 1986
..........
s16_update 1: meter->now 11136, s16->old 10261, size 875 s16_update 1: meter->now 22525, s16->old 22524, size 1 s16_update 1: meter->now 963, s16->old 22525, size -21562 s16_update 1: meter->now 1270914, s16->old 963, size 1269951 s16_update 1: meter->now 1272917, s16->old 1270914, size 2003
I think the problem is that s16->old is not reset when status.appl_ptr is zeroed and starts running again. There is a call
static void s16_reset(snd_pcm_scope_t *scope) { snd_pcm_scope_s16_t *s16 = scope->private_data; snd_pcm_meter_t *meter = s16->pcm->private_data; s16->old = meter->now; }
but I do not know when this method is called and whether the meter->now is already assigned to the newly zeroed status.appl_ptr.
Would adding atomic_add(&meter->reset, 1) in snd_pcm_meter_reset() help?
Unfortunately not.
s16_reset is called correctly, setting s16->old = meter->now; But at that time meter->now is still 22751, setting s16->old to the same value.
s16_update 1: meter->now 22751, s16->old 22751, size 0
However, in the next update call meter->now comes from the freshly started application pointer:
s16_update 1: meter->now 839, s16->old 22751, size -21912
Of course this helps:
- if (size < 0) - size += spcm->boundary; + if (size < 0) { + size = meter->now; + s16->old = 0; + }
But I understand this is not a solution because:
* it will not work at reaching spcm->boundary (after thousands of hours?) * it will cause the same problem when the stream is rewound (which is the problem now too) - size will equal to large meter->now (length from the beginning of the stream minus the rewound = large number).
BTW, please why is snd_pcm_meter_update_scope called only in capture stream?
Thanks,
Pavel.
Dne 28. 07. 20 v 20:04 Pavel Hofman napsal(a):
Dne 28. 07. 20 v 19:04 Takashi Iwai napsal(a):
On Tue, 28 Jul 2020 18:46:00 +0200, Pavel Hofman wrote:
The debug around the event looks like this, suddenly meter->now drops down:
s16_update 1: meter->now 2567498, s16->old 2566593, size 905 s16_update 1: meter->now 2568401, s16->old 2567498, size 903 s16_update 1: meter->now 20786, s16->old 20786, size 0 s16_update 1: meter->now 1065, s16->old 20786, size -19721 s16_update 1: meter->now 24839, s16->old 24838, size 1 s16_update 1: meter->now 701, s16->old 24839, size -24138 s16_update 1: meter->now 1253162, s16->old 701, size 1252461 s16_update 1: meter->now 1255148, s16->old 1253162, size 1986
..........
s16_update 1: meter->now 11136, s16->old 10261, size 875 s16_update 1: meter->now 22525, s16->old 22524, size 1 s16_update 1: meter->now 963, s16->old 22525, size -21562 s16_update 1: meter->now 1270914, s16->old 963, size 1269951 s16_update 1: meter->now 1272917, s16->old 1270914, size 2003
I think the problem is that s16->old is not reset when status.appl_ptr is zeroed and starts running again. There is a call
static void s16_reset(snd_pcm_scope_t *scope) { snd_pcm_scope_s16_t *s16 = scope->private_data; snd_pcm_meter_t *meter = s16->pcm->private_data; s16->old = meter->now; }
but I do not know when this method is called and whether the meter->now is already assigned to the newly zeroed status.appl_ptr.
Would adding atomic_add(&meter->reset, 1) in snd_pcm_meter_reset() help?
Unfortunately not.
s16_reset is called correctly, setting s16->old = meter->now; But at that time meter->now is still 22751, setting s16->old to the same value.
s16_update 1: meter->now 22751, s16->old 22751, size 0
However, in the next update call meter->now comes from the freshly started application pointer:
s16_update 1: meter->now 839, s16->old 22751, size -21912
Of course this helps:
if (size < 0)
size += spcm->boundary;
if (size < 0) {
size = meter->now;
s16->old = 0;
}
But I understand this is not a solution because:
- it will not work at reaching spcm->boundary (after thousands of hours?)
- it will cause the same problem when the stream is rewound (which is
the problem now too) - size will equal to large meter->now (length from the beginning of the stream minus the rewound = large number).
IMHO there are two cases of the [application pointer + delay] drop compared to the previous run:
* stream start, rewinding => s16->old = meter->now; size =0, i.e. skipping the samples to show * wrapping at spcm->boundary => size += spcm->boundary, i.e. showing the wrapped samples
Optionally the second case could be handled just like the first case by resetting s16->old, assuming the boundary wrap occurs very infrequently.
Pavel.
Dne 28. 07. 20 v 20:54 Pavel Hofman napsal(a):
Dne 28. 07. 20 v 20:04 Pavel Hofman napsal(a):
Dne 28. 07. 20 v 19:04 Takashi Iwai napsal(a):
Would adding atomic_add(&meter->reset, 1) in snd_pcm_meter_reset() help?
Unfortunately not.
s16_reset is called correctly, setting s16->old = meter->now; But at that time meter->now is still 22751, setting s16->old to the same value.
s16_update 1: meter->now 22751, s16->old 22751, size 0
However, in the next update call meter->now comes from the freshly started application pointer:
s16_update 1: meter->now 839, s16->old 22751, size -21912
Of course this helps:
- if (size < 0) - size += spcm->boundary; + if (size < 0) { + size = meter->now; + s16->old = 0; + }
But I understand this is not a solution because:
- it will not work at reaching spcm->boundary (after thousands of hours?)
- it will cause the same problem when the stream is rewound (which is
the problem now too) - size will equal to large meter->now (length from the beginning of the stream minus the rewound = large number).
IMHO there are two cases of the [application pointer + delay] drop compared to the previous run:
- stream start, rewinding => s16->old = meter->now; size =0, i.e.
skipping the samples to show
- wrapping at spcm->boundary => size += spcm->boundary, i.e. showing the
wrapped samples
Optionally the second case could be handled just like the first case by resetting s16->old, assuming the boundary wrap occurs very infrequently.
The following patch is tested to work OK, no CPU peaks and no meter output glitches when the size < 0 condition occurs:
diff --git a/src/pcm/pcm_meter.c b/src/pcm/pcm_meter.c index 20b41876..48df5945 100644 --- a/src/pcm/pcm_meter.c +++ b/src/pcm/pcm_meter.c @@ -1098,8 +1098,15 @@ static void s16_update(snd_pcm_scope_t *scope) snd_pcm_sframes_t size; snd_pcm_uframes_t offset; size = meter->now - s16->old; - if (size < 0) - size += spcm->boundary; + if (size < 0) { + /** + * Application pointer adjusted for delay (meter->now) has dropped compared + * to the previous update cycle. Either spcm->boundary wraparound, pcm rewinding, + * or pcm restart without s16->old properly reset. + * In any case the safest solution is skipping this conversion cycle. + */ + size = 0; + } offset = s16->old % meter->buf_size; while (size > 0) { snd_pcm_uframes_t frames = size;
Please will you accept this (workaround) bugfix? If so, I would send a proper patch.
Thanks a lot,
Pavel.
On Sun, 02 Aug 2020 19:50:44 +0200, Pavel Hofman wrote:
Dne 28. 07. 20 v 20:54 Pavel Hofman napsal(a):
Dne 28. 07. 20 v 20:04 Pavel Hofman napsal(a):
Dne 28. 07. 20 v 19:04 Takashi Iwai napsal(a):
Would adding atomic_add(&meter->reset, 1) in snd_pcm_meter_reset() help?
Unfortunately not.
s16_reset is called correctly, setting s16->old = meter->now; But at that time meter->now is still 22751, setting s16->old to the same value.
s16_update 1: meter->now 22751, s16->old 22751, size 0
However, in the next update call meter->now comes from the freshly started application pointer:
s16_update 1: meter->now 839, s16->old 22751, size -21912
Of course this helps:
- if (size < 0) - size += spcm->boundary; + if (size < 0) { + size = meter->now; + s16->old = 0; + }
But I understand this is not a solution because:
- it will not work at reaching spcm->boundary (after thousands of hours?)
- it will cause the same problem when the stream is rewound (which is
the problem now too) - size will equal to large meter->now (length from the beginning of the stream minus the rewound = large number).
IMHO there are two cases of the [application pointer + delay] drop compared to the previous run:
- stream start, rewinding => s16->old = meter->now; size =0, i.e.
skipping the samples to show
- wrapping at spcm->boundary => size += spcm->boundary, i.e. showing the
wrapped samples
Optionally the second case could be handled just like the first case by resetting s16->old, assuming the boundary wrap occurs very infrequently.
The following patch is tested to work OK, no CPU peaks and no meter output glitches when the size < 0 condition occurs:
diff --git a/src/pcm/pcm_meter.c b/src/pcm/pcm_meter.c index 20b41876..48df5945 100644 --- a/src/pcm/pcm_meter.c +++ b/src/pcm/pcm_meter.c @@ -1098,8 +1098,15 @@ static void s16_update(snd_pcm_scope_t *scope) snd_pcm_sframes_t size; snd_pcm_uframes_t offset; size = meter->now - s16->old;
if (size < 0)
size += spcm->boundary;
if (size < 0) {
/**
* Application pointer adjusted for delay (meter->now)
has dropped compared
* to the previous update cycle. Either spcm->boundary
wraparound, pcm rewinding,
* or pcm restart without s16->old properly reset.
* In any case the safest solution is skipping this
conversion cycle.
*/
size = 0;
} offset = s16->old % meter->buf_size; while (size > 0) { snd_pcm_uframes_t frames = size;
Please will you accept this (workaround) bugfix? If so, I would send a proper patch.
It looks OK, at least this must be safe. So yes, I'll happily apply if you submit a proper patch.
thanks,
Takashi
Dne 03. 08. 20 v 8:17 Takashi Iwai napsal(a):
On Sun, 02 Aug 2020 19:50:44 +0200, Pavel Hofman wrote:
Dne 28. 07. 20 v 20:54 Pavel Hofman napsal(a):
Dne 28. 07. 20 v 20:04 Pavel Hofman napsal(a):
Dne 28. 07. 20 v 19:04 Takashi Iwai napsal(a):
Would adding atomic_add(&meter->reset, 1) in snd_pcm_meter_reset() help?
Unfortunately not.
s16_reset is called correctly, setting s16->old = meter->now; But at that time meter->now is still 22751, setting s16->old to the same value.
s16_update 1: meter->now 22751, s16->old 22751, size 0
However, in the next update call meter->now comes from the freshly started application pointer:
s16_update 1: meter->now 839, s16->old 22751, size -21912
Of course this helps:
- if (size < 0) - size += spcm->boundary; + if (size < 0) { + size = meter->now; + s16->old = 0; + }
But I understand this is not a solution because:
- it will not work at reaching spcm->boundary (after thousands of hours?)
- it will cause the same problem when the stream is rewound (which is
the problem now too) - size will equal to large meter->now (length from the beginning of the stream minus the rewound = large number).
IMHO there are two cases of the [application pointer + delay] drop compared to the previous run:
- stream start, rewinding => s16->old = meter->now; size =0, i.e.
skipping the samples to show
- wrapping at spcm->boundary => size += spcm->boundary, i.e. showing the
wrapped samples
Optionally the second case could be handled just like the first case by resetting s16->old, assuming the boundary wrap occurs very infrequently.
The following patch is tested to work OK, no CPU peaks and no meter output glitches when the size < 0 condition occurs:
diff --git a/src/pcm/pcm_meter.c b/src/pcm/pcm_meter.c index 20b41876..48df5945 100644 --- a/src/pcm/pcm_meter.c +++ b/src/pcm/pcm_meter.c @@ -1098,8 +1098,15 @@ static void s16_update(snd_pcm_scope_t *scope) snd_pcm_sframes_t size; snd_pcm_uframes_t offset; size = meter->now - s16->old;
if (size < 0)
size += spcm->boundary;
if (size < 0) {
/**
* Application pointer adjusted for delay (meter->now)
has dropped compared
* to the previous update cycle. Either spcm->boundary
wraparound, pcm rewinding,
* or pcm restart without s16->old properly reset.
* In any case the safest solution is skipping this
conversion cycle.
*/
size = 0;
} offset = s16->old % meter->buf_size; while (size > 0) { snd_pcm_uframes_t frames = size;
Please will you accept this (workaround) bugfix? If so, I would send a proper patch.
It looks OK, at least this must be safe. So yes, I'll happily apply if you submit a proper patch.
It would be probably better to check against the boundary / 2 value to check correctly the boundary wrap instead to drop all negative size values:
if (size < 0) { if (size < -(spcm->boundary / 2)) size += spcm->boundary; else size = 0; }
The "hidden" pcm restart referred in the comment should not occur, otherwise it's another bug somewhere.
Jaroslav
Dne 03. 08. 20 v 9:22 Jaroslav Kysela napsal(a):
Dne 03. 08. 20 v 8:17 Takashi Iwai napsal(a):
On Sun, 02 Aug 2020 19:50:44 +0200, Pavel Hofman wrote:
Dne 28. 07. 20 v 20:54 Pavel Hofman napsal(a):
Dne 28. 07. 20 v 20:04 Pavel Hofman napsal(a):
Dne 28. 07. 20 v 19:04 Takashi Iwai napsal(a):
Would adding atomic_add(&meter->reset, 1) in snd_pcm_meter_reset() help?
Unfortunately not.
s16_reset is called correctly, setting s16->old = meter->now; But at that time meter->now is still 22751, setting s16->old to the same value.
s16_update 1: meter->now 22751, s16->old 22751, size 0
However, in the next update call meter->now comes from the freshly started application pointer:
s16_update 1: meter->now 839, s16->old 22751, size -21912
Of course this helps:
- if (size < 0) - size += spcm->boundary; + if (size < 0) { + size = meter->now; + s16->old = 0; + }
But I understand this is not a solution because:
- it will not work at reaching spcm->boundary (after thousands of hours?)
- it will cause the same problem when the stream is rewound (which is
the problem now too) - size will equal to large meter->now (length from the beginning of the stream minus the rewound = large number).
IMHO there are two cases of the [application pointer + delay] drop compared to the previous run:
- stream start, rewinding => s16->old = meter->now; size =0, i.e.
skipping the samples to show
- wrapping at spcm->boundary => size += spcm->boundary, i.e. showing the
wrapped samples
Optionally the second case could be handled just like the first case by resetting s16->old, assuming the boundary wrap occurs very infrequently.
The following patch is tested to work OK, no CPU peaks and no meter output glitches when the size < 0 condition occurs:
diff --git a/src/pcm/pcm_meter.c b/src/pcm/pcm_meter.c index 20b41876..48df5945 100644 --- a/src/pcm/pcm_meter.c +++ b/src/pcm/pcm_meter.c @@ -1098,8 +1098,15 @@ static void s16_update(snd_pcm_scope_t *scope) snd_pcm_sframes_t size; snd_pcm_uframes_t offset; size = meter->now - s16->old;
if (size < 0)
size += spcm->boundary;
if (size < 0) {
/**
* Application pointer adjusted for delay (meter->now)
has dropped compared
* to the previous update cycle. Either spcm->boundary
wraparound, pcm rewinding,
* or pcm restart without s16->old properly reset.
* In any case the safest solution is skipping this
conversion cycle.
*/
size = 0;
} offset = s16->old % meter->buf_size; while (size > 0) { snd_pcm_uframes_t frames = size;
Please will you accept this (workaround) bugfix? If so, I would send a proper patch.
It looks OK, at least this must be safe. So yes, I'll happily apply if you submit a proper patch.
It would be probably better to check against the boundary / 2 value to check correctly the boundary wrap instead to drop all negative size values:
if (size < 0) { if (size < -(spcm->boundary / 2)) size += spcm->boundary; else size = 0; }
Is there a reliable way to detect the boundary wraparound, at best using some dedicated API? I could find any, IMO the wraparound does not create any notification. The check is OK for a rewind, half of boundary is usually a very large number too. I am not sure what would happen at reset when application pointer was already past the boundary half - see below.
The "hidden" pcm restart referred in the comment should not occur, otherwise it's another bug somewhere.
I do not know the exact moments when plugin API methods are called. The fact is Takashi's suggestion to call s16 reset explicitely in snd_pcm_meter_reset created this order:
snd_pcm_meter_reset -> s16->reset s16_update: meter->now 22751, s16->old 22751, size 0 s16_update: meter->now 839, s16->old 22751, size -21912
I.e. AFTER resetting meter/s16 the variable meter->now was still at the original large 22751 (with s16->old equal to its value due to s16->reset). The value of meter->now was reset to 839 (= app pointer - delay) only in the next call of s16_update (when s16->old was still the previous old value => size < 0 => huge size => high CPU load). From this I kind of conclude that the reset is buggy. Maybe the reset code should re-calculate meter->now = appl.pointer - delay before aligning s16->old = meter->now.
Nevertheless all this (except for the boundary wraparound) would result in the same size = 0, thus skipping samples from the last cycle, just like what the proposed patch does.
Pavel.
Dne 03. 08. 20 v 12:48 Pavel Hofman napsal(a):
Dne 03. 08. 20 v 9:22 Jaroslav Kysela napsal(a):
Dne 03. 08. 20 v 8:17 Takashi Iwai napsal(a):
On Sun, 02 Aug 2020 19:50:44 +0200,
Optionally the second case could be handled just like the first case by resetting s16->old, assuming the boundary wrap occurs very infrequently.
The following patch is tested to work OK, no CPU peaks and no meter output glitches when the size < 0 condition occurs:
diff --git a/src/pcm/pcm_meter.c b/src/pcm/pcm_meter.c index 20b41876..48df5945 100644 --- a/src/pcm/pcm_meter.c +++ b/src/pcm/pcm_meter.c @@ -1098,8 +1098,15 @@ static void s16_update(snd_pcm_scope_t *scope) snd_pcm_sframes_t size; snd_pcm_uframes_t offset; size = meter->now - s16->old; - if (size < 0) - size += spcm->boundary; + if (size < 0) { + /** + * Application pointer adjusted for delay (meter->now) has dropped compared + * to the previous update cycle. Either spcm->boundary wraparound, pcm rewinding, + * or pcm restart without s16->old properly reset. + * In any case the safest solution is skipping this conversion cycle. + */ + size = 0; + } offset = s16->old % meter->buf_size; while (size > 0) { snd_pcm_uframes_t frames = size;
Please will you accept this (workaround) bugfix? If so, I would send a proper patch.
It looks OK, at least this must be safe. So yes, I'll happily apply if you submit a proper patch.
It would be probably better to check against the boundary / 2 value to check correctly the boundary wrap instead to drop all negative size values:
if (size < 0) { if (size < -(spcm->boundary / 2)) size += spcm->boundary; else size = 0; }
Is there a reliable way to detect the boundary wraparound, at best using some dedicated API? I could find any, IMO the wraparound does not create any notification. The check is OK for a rewind, half of boundary is usually a very large number too. I am not sure what would happen at reset when application pointer was already past the boundary half - see below.
The "hidden" pcm restart referred in the comment should not occur, otherwise it's another bug somewhere.
I do not know the exact moments when plugin API methods are called. The fact is Takashi's suggestion to call s16 reset explicitely in snd_pcm_meter_reset created this order:
snd_pcm_meter_reset -> s16->reset s16_update: meter->now 22751, s16->old 22751, size 0 s16_update: meter->now 839, s16->old 22751, size -21912
I.e. AFTER resetting meter/s16 the variable meter->now was still at the original large 22751 (with s16->old equal to its value due to s16->reset). The value of meter->now was reset to 839 (= app pointer - delay) only in the next call of s16_update (when s16->old was still the previous old value => size < 0 => huge size => high CPU load). From this I kind of conclude that the reset is buggy. Maybe the reset code should re-calculate meter->now = appl.pointer - delay before aligning s16->old = meter->now.
Nevertheless all this (except for the boundary wraparound) would result in the same size = 0, thus skipping samples from the last cycle, just like what the proposed patch does.
Please can we reach a decision and close the problem so that affected use cases do not have to be patched with the next the alsa-lib version?
Thanks a lot in advance,
Pavel.
Dne 09. 08. 20 v 9:05 Pavel Hofman napsal(a):
Dne 03. 08. 20 v 12:48 Pavel Hofman napsal(a):
Dne 03. 08. 20 v 9:22 Jaroslav Kysela napsal(a):
Dne 03. 08. 20 v 8:17 Takashi Iwai napsal(a):
On Sun, 02 Aug 2020 19:50:44 +0200,
Optionally the second case could be handled just like the first case by resetting s16->old, assuming the boundary wrap occurs very infrequently.
The following patch is tested to work OK, no CPU peaks and no meter output glitches when the size < 0 condition occurs:
diff --git a/src/pcm/pcm_meter.c b/src/pcm/pcm_meter.c index 20b41876..48df5945 100644 --- a/src/pcm/pcm_meter.c +++ b/src/pcm/pcm_meter.c @@ -1098,8 +1098,15 @@ static void s16_update(snd_pcm_scope_t *scope) snd_pcm_sframes_t size; snd_pcm_uframes_t offset; size = meter->now - s16->old; - if (size < 0) - size += spcm->boundary; + if (size < 0) { + /** + * Application pointer adjusted for delay (meter->now) has dropped compared + * to the previous update cycle. Either spcm->boundary wraparound, pcm rewinding, + * or pcm restart without s16->old properly reset. + * In any case the safest solution is skipping this conversion cycle. + */ + size = 0; + } offset = s16->old % meter->buf_size; while (size > 0) { snd_pcm_uframes_t frames = size;
Please will you accept this (workaround) bugfix? If so, I would send a proper patch.
It looks OK, at least this must be safe. So yes, I'll happily apply if you submit a proper patch.
It would be probably better to check against the boundary / 2 value to check correctly the boundary wrap instead to drop all negative size values:
if (size < 0) { if (size < -(spcm->boundary / 2)) size += spcm->boundary; else size = 0; }
Is there a reliable way to detect the boundary wraparound, at best using some dedicated API? I could find any, IMO the wraparound does not create any notification. The check is OK for a rewind, half of boundary is usually a very large number too. I am not sure what would happen at reset when application pointer was already past the boundary half - see below.
Yes, it's a good argument. In this case, the s16->old value is not properly synced during the reset operation, otherwise the boundary / 2 threshold (change limit) is sufficient to detect the boundary wrap.
The "hidden" pcm restart referred in the comment should not occur, otherwise it's another bug somewhere.
I do not know the exact moments when plugin API methods are called. The fact is Takashi's suggestion to call s16 reset explicitely in snd_pcm_meter_reset created this order:
snd_pcm_meter_reset -> s16->reset s16_update: meter->now 22751, s16->old 22751, size 0 s16_update: meter->now 839, s16->old 22751, size -21912
I.e. AFTER resetting meter/s16 the variable meter->now was still at the original large 22751 (with s16->old equal to its value due to s16->reset). The value of meter->now was reset to 839 (= app pointer - delay) only in the next call of s16_update (when s16->old was still the previous old value => size < 0 => huge size => high CPU load). From this I kind of conclude that the reset is buggy. Maybe the reset code should re-calculate meter->now = appl.pointer - delay before aligning s16->old = meter->now.
Nevertheless all this (except for the boundary wraparound) would result in the same size = 0, thus skipping samples from the last cycle, just like what the proposed patch does.
Please can we reach a decision and close the problem so that affected use cases do not have to be patched with the next the alsa-lib version?
I think that this problem should be fixed for reset and rewind separately. The meter->reset should be set in snd_pcm_meter_reset() inside the running_mutex lock to serialize correctly the update operations in the snd_pcm_meter_thread(). And perhaps, we can follow this logic for the rewind.
I mean, we should ensure to call the s16->reset at the proper time to avoid broken old/now combinations inside the scope "clients".
Your proposed solution is just a workaround.
Jaroslav
Thanks a lot in advance,
Pavel.
Dne 09. 08. 20 v 22:29 Jaroslav Kysela napsal(a):
Dne 09. 08. 20 v 9:05 Pavel Hofman napsal(a):
Dne 03. 08. 20 v 12:48 Pavel Hofman napsal(a):
Dne 03. 08. 20 v 9:22 Jaroslav Kysela napsal(a):
Dne 03. 08. 20 v 8:17 Takashi Iwai napsal(a):
On Sun, 02 Aug 2020 19:50:44 +0200,
> > Optionally the second case could be handled just like the first > case by > resetting s16->old, assuming the boundary wrap occurs very > infrequently.
The following patch is tested to work OK, no CPU peaks and no meter output glitches when the size < 0 condition occurs:
diff --git a/src/pcm/pcm_meter.c b/src/pcm/pcm_meter.c index 20b41876..48df5945 100644 --- a/src/pcm/pcm_meter.c +++ b/src/pcm/pcm_meter.c @@ -1098,8 +1098,15 @@ static void s16_update(snd_pcm_scope_t *scope) snd_pcm_sframes_t size; snd_pcm_uframes_t offset; size = meter->now - s16->old; - if (size < 0) - size += spcm->boundary; + if (size < 0) { + /** + * Application pointer adjusted for delay (meter->now) has dropped compared + * to the previous update cycle. Either spcm->boundary wraparound, pcm rewinding, + * or pcm restart without s16->old properly reset. + * In any case the safest solution is skipping this conversion cycle. + */ + size = 0; + } offset = s16->old % meter->buf_size; while (size > 0) { snd_pcm_uframes_t frames = size;
Please will you accept this (workaround) bugfix? If so, I would send a proper patch.
It looks OK, at least this must be safe. So yes, I'll happily apply if you submit a proper patch.
It would be probably better to check against the boundary / 2 value to check correctly the boundary wrap instead to drop all negative size values:
if (size < 0) { if (size < -(spcm->boundary / 2)) size += spcm->boundary; else size = 0; }
Is there a reliable way to detect the boundary wraparound, at best using some dedicated API? I could find any, IMO the wraparound does not create any notification. The check is OK for a rewind, half of boundary is usually a very large number too. I am not sure what would happen at reset when application pointer was already past the boundary half - see below.
Yes, it's a good argument. In this case, the s16->old value is not properly synced during the reset operation, otherwise the boundary / 2 threshold (change limit) is sufficient to detect the boundary wrap.
The "hidden" pcm restart referred in the comment should not occur, otherwise it's another bug somewhere.
I do not know the exact moments when plugin API methods are called. The fact is Takashi's suggestion to call s16 reset explicitely in snd_pcm_meter_reset created this order:
snd_pcm_meter_reset -> s16->reset s16_update: meter->now 22751, s16->old 22751, size 0 s16_update: meter->now 839, s16->old 22751, size -21912
I.e. AFTER resetting meter/s16 the variable meter->now was still at the original large 22751 (with s16->old equal to its value due to s16->reset). The value of meter->now was reset to 839 (= app pointer - delay) only in the next call of s16_update (when s16->old was still the previous old value => size < 0 => huge size => high CPU load). From this I kind of conclude that the reset is buggy. Maybe the reset code should re-calculate meter->now = appl.pointer - delay before aligning s16->old = meter->now.
Nevertheless all this (except for the boundary wraparound) would result in the same size = 0, thus skipping samples from the last cycle, just like what the proposed patch does.
Please can we reach a decision and close the problem so that affected use cases do not have to be patched with the next the alsa-lib version?
I think that this problem should be fixed for reset and rewind separately. The meter->reset should be set in snd_pcm_meter_reset() inside the running_mutex lock to serialize correctly the update operations in the snd_pcm_meter_thread(). And perhaps, we can follow this logic for the rewind.
I mean, we should ensure to call the s16->reset at the proper time to avoid broken old/now combinations inside the scope "clients".
Your proposed solution is just a workaround.
I am well aware of that. The main cause of the problem is that the existing code assumes that a drop in the meter->now value is caused by the pcm->boundary wraparound. Only for that particular case the existing size += spcm->boundary code is correct, for all the other cases it is grossly wrong, locking the thread for many tens of seconds and jamming CPU. If there was a callback or some other way to signal the boundary wraparound that "dangerous" code would be called only for that special case (which is extremely rare in usual setups).
I do not know all cases when the meter->now can drop. Reset, rewind, any other (xrun)? If a single case is omitted, the same problem prevails.
No problem with resetting where appropriate, but still I would suggest to not keep size += spcm->boundary in the s16_update as is now.
Regards,
Pavel.
participants (3)
-
Jaroslav Kysela
-
Pavel Hofman
-
Takashi Iwai