Re: linux-6.4 alsa sound broken

On 03. 05. 23 18:10, Takashi Iwai wrote:
On Mon, 01 May 2023 09:17:20 +0200, Oswald Buddenhagen wrote:
On Mon, May 01, 2023 at 11:59:12AM +0800, Jeff Chua wrote:
Latest git pull from Linus's tree ... playing a simple sound file will resulted in a lot of echo.
how _exactly_ does it sound? have you recorded a file through loopback for us to investigate? best would be a short sample of a clean wave (sine or sawtooth) with some leading and trailing silence.
Running on Lenovo X1 with .. 00:1f.3 Audio device: Intel Corporation Alder Lake PCH-P High Definition Audio Controller (rev 01)
I've bisected and reverted the following patch fixed the problem.
this seems weird. so my first thought is: are you _sure_ that your bisect isn't "contaminated" somehow? is the effect consistent across several reboots with the same build? does re-applying my patch immediately re-introduce the problem?
- this code is about silencing. getting dropouts or no playback at all
would be plausible, while echo (that is, repetition) seems surprising. theoretically, the driver may be setting a bad fill_silence() callback which copies some garbage instead of zeroing, but the HDA driver doesn't set one at all (i.e., uses the default one).
- this code must be explicitly enabled, which for all i know is done
by almost nothing. what players did you try? did you get consistent results? did you try taking out audio servers from the equation?
- the affected hardware belongs to the extremely widely used HDA
family, which at the layer the patch is even remotely connected with is completely standardized. so _a lot_ of people should be affected, and we should be getting reports like yours by the dozen. are we?
of course i can't exclude the possibility that my patch is affected by an uninitialized variable or memory corruption (or in the worst case causes it), which would of course have very hard to predict effects. but that should be investigated properly instead of just reverting, lest we might be papering over a much more serious problem.
Oswald, this looks like a real regression by the patch. Specially, this happens with dmix, and the issue doesn't seem specific to the driver. It happens also with USB-audio, not only with HD-audio. Just aplay /usr/share/sounds/alsa/Side_Left.wav or whatever there with the dmix config showed the problem.
The dmix uses the silence_size=boundary as a fill-all operation, and it's a free-wheel mode, so supposedly something was overlooked in your code refactoring.
Could you check it and address quickly? I'd like to fix it before 6.4-rc1 release, so if no fix comes up in a couple of days, I'll have to revert the change for 6.4-rc1.
I would revert this patch. It seems that this "do silence right after the playback is finished" mechanism is not handled in the updated code (and I overlooked that, too):
- ofs = runtime->status->hw_ptr; - frames = new_hw_ptr - ofs; - if ((snd_pcm_sframes_t)frames < 0) - frames += runtime->boundary; - runtime->silence_filled -= frames; - if ((snd_pcm_sframes_t)runtime->silence_filled < 0) { - runtime->silence_filled = 0; - runtime->silence_start = new_hw_ptr; - } else { - runtime->silence_start = ofs; - }
It requires to track the old and new hw_ptr, so the removal of the new_hw_ptr argument is not valid. I don't see any easy way to fix this.
I would probably fix the snd_pcm_playback_hw_avail() call with the old hw_ptr which seems like the only one issue with the original code, because it makes the threshold inaccurate (it is expected to fill more silent samples). Another issue is wrong silence_start for the incremental silence calls.
The patch to fix the original code may look like:
diff --git a/sound/core/pcm_lib.c b/sound/core/pcm_lib.c index af1eb136feb0..70795a83e50a 100644 --- a/sound/core/pcm_lib.c +++ b/sound/core/pcm_lib.c @@ -45,7 +45,7 @@ static int fill_silence_frames(struct snd_pcm_substream *substream, void snd_pcm_playback_silence(struct snd_pcm_substream *substream, snd_pcm_uframes_t new_hw_ptr) { struct snd_pcm_runtime *runtime = substream->runtime; - snd_pcm_uframes_t frames, ofs, transfer; + snd_pcm_uframes_t start, frames, ofs, transfer; int err;
if (runtime->silence_size < runtime->boundary) { @@ -63,12 +63,17 @@ void snd_pcm_playback_silence(struct snd_pcm_substream *substream, snd_pcm_ufram } if (runtime->silence_filled >= runtime->buffer_size) return; + /* use appl_ptr as a temporary variable */ + appl_ptr = runtime->status->hw_ptr; + runtime->status->hw_ptr = new_hw_ptr; noise_dist = snd_pcm_playback_hw_avail(runtime) + runtime->silence_filled; + runtime->status->hw_ptr = appl_ptr; if (noise_dist >= (snd_pcm_sframes_t) runtime->silence_threshold) return; frames = runtime->silence_threshold - noise_dist; if (frames > runtime->silence_size) frames = runtime->silence_size; + start = (runtime->silence_start + runtime->silence_filled) % runtime->boundary; } else { if (new_hw_ptr == ULONG_MAX) { /* initialization */ snd_pcm_sframes_t avail = snd_pcm_playback_hw_avail(runtime); @@ -92,12 +97,13 @@ void snd_pcm_playback_silence(struct snd_pcm_substream *substream, snd_pcm_ufram } } frames = runtime->buffer_size - runtime->silence_filled; + start = runtime->silence_start; } if (snd_BUG_ON(frames > runtime->buffer_size)) return; if (frames == 0) return; - ofs = runtime->silence_start % runtime->buffer_size; + ofs = start % runtime->buffer_size; while (frames > 0) { transfer = ofs + frames > runtime->buffer_size ? runtime->buffer_size - ofs : frames; err = fill_silence_frames(substream, ofs, transfer);
I'll post a complete patch when we agree on this solution. The runtime->status->hw_ptr may not be even preserved, because it is no used in the rest of code in snd_pcm_update_hw_ptr0(), but the code looks more sane.
Jaroslav

On Wed, May 03, 2023 at 09:32:02PM +0200, Jaroslav Kysela wrote:
On 03. 05. 23 18:10, Takashi Iwai wrote:
The dmix uses the silence_size=boundary as a fill-all operation, and it's a free-wheel mode, so supposedly something was overlooked in your code refactoring.
Could you check it and address quickly? I'd like to fix it before 6.4-rc1 release, so if no fix comes up in a couple of days, I'll have to revert the change for 6.4-rc1.
I would revert this patch.
It seems that this "do silence right after the playback is finished" mechanism is not handled in the updated code (and I overlooked that, too):
no, there is nothing wrong with the code _per se_.
what's happening is that the dmix plugin doesn't update the application pointer, and somehow gets away with it.
that means that it would have never worked with thresholded silencing mode, either, but, well, it uses top-up mode.
anyway, this means that we need to revert the code path for top-up mode, which means reverting most of the patch's "meat". i think i can do better than your proposal, but not today anymore.
fwiw, the echo results from the plugin apparently summing up the samples in the buffer without clearing it first, that is, it relies on the auto-silencing doing the clearing, which the patch broke under the given circumstances. rather obvious in retrospect.
regards

On 03. 05. 23 22:00, Oswald Buddenhagen wrote:
On Wed, May 03, 2023 at 09:32:02PM +0200, Jaroslav Kysela wrote:
On 03. 05. 23 18:10, Takashi Iwai wrote:
The dmix uses the silence_size=boundary as a fill-all operation, and it's a free-wheel mode, so supposedly something was overlooked in your code refactoring.
Could you check it and address quickly? I'd like to fix it before 6.4-rc1 release, so if no fix comes up in a couple of days, I'll have to revert the change for 6.4-rc1.
I would revert this patch.
It seems that this "do silence right after the playback is finished" mechanism is not handled in the updated code (and I overlooked that, too):
no, there is nothing wrong with the code _per se_.
what's happening is that the dmix plugin doesn't update the application pointer, and somehow gets away with it.
Dmix uses the free mode, because multiple applications can write to the buffer. We cannot do application pointer updates in the shared resource.
anyway, this means that we need to revert the code path for top-up mode, which means reverting most of the patch's "meat". i think i can do better than your proposal, but not today anymore.
Ok, let's see. I tried to be minimalistic to fix bugs and then we can talk about the improvements.
fwiw, the echo results from the plugin apparently summing up the samples in the buffer without clearing it first, that is, it relies on the auto-silencing doing the clearing, which the patch broke under the given circumstances. rather obvious in retrospect.
Dmix does not know which samples were updated by other applications. The application tracks only own samples.
Jaroslav

On Wed, 03 May 2023 22:00:37 +0200, Oswald Buddenhagen wrote:
On Wed, May 03, 2023 at 09:32:02PM +0200, Jaroslav Kysela wrote:
On 03. 05. 23 18:10, Takashi Iwai wrote:
The dmix uses the silence_size=boundary as a fill-all operation, and it's a free-wheel mode, so supposedly something was overlooked in your code refactoring.
Could you check it and address quickly? I'd like to fix it before 6.4-rc1 release, so if no fix comes up in a couple of days, I'll have to revert the change for 6.4-rc1.
I would revert this patch.
It seems that this "do silence right after the playback is finished" mechanism is not handled in the updated code (and I overlooked that, too):
no, there is nothing wrong with the code _per se_.
what's happening is that the dmix plugin doesn't update the application pointer, and somehow gets away with it.
that means that it would have never worked with thresholded silencing mode, either, but, well, it uses top-up mode.
Well, the code made just a wrong interpretation for the behavior with silence_size == boundary. This mode is actually a kind of tailored operation for dmix.
In the description of alsa-lib snd_pcm_sw_params_set_silence_size(), you can find it:
/** * \brief Set silence size inside a software configuration container * \param pcm PCM handle * \param params Software configuration container * \param val Silence size in frames (0 for disabled) * \return 0 otherwise a negative error code * * A portion of playback buffer is overwritten with silence when playback * underrun is nearer than silence threshold (see * #snd_pcm_sw_params_set_silence_threshold) * * The special case is when silence size value is equal or greater than * boundary. The unused portion of the ring buffer (initial written samples * are untouched) is filled with silence at start. Later, only just processed * sample area is filled with silence. Note: silence_threshold must be set to zero. */
So, the "top-up" silencing happens only at start, but not after that. In the code path of hw_ptr update, it doesn't check the appl_ptr any longer, but fills the processed area by the hw_ptr update with silence. That's the intended behavior for use cases of free-wheel mode without appl_ptr updates like dmix.
Takashi
participants (3)
-
Jaroslav Kysela
-
Oswald Buddenhagen
-
Takashi Iwai