Re: Issue in pcm_dsnoop.c in alsa-lib
Hi
Hi Takashi Iwai, Jaroslav Kysela
We encountered an issue in the pcm_dsnoop use case, could you
please help to have a look?
*Issue description:* With two instances for dsnoop type device running in parallel,
after suspend/resume, one of the instances will be hung in memcpy because the very large copy size is obtained.
#3 0x0000ffffa78d5098 in snd_pcm_dsnoop_sync_ptr
(pcm=0xaaab06563da0)
at pcm_dsnoop.c:158 dsnoop = 0xaaab06563c20 slave_hw_ptr = 64 old_slave_hw_ptr = 533120 avail = *187651522444320*
- Reason analysis: *
The root cause that I analysis is that after suspend/resume, one instance will get the SND_PCM_STATE_SUSPENDED state from slave pcm
device,
then it will do snd_pcm_prepare() and snd_pcm_start(), which will reset the dsnoop->slave_hw_ptr and the hw_ptr of slave pcm device, then the state of this instance is correct. But another instance may not get the SND_PCM_STATE_SUSPENDED state from
slave
pcm device because slave device may have been recovered by first instance, so the dsnoop->slave_hw_ptr is not reset. but because hw_ptr of slave pcm device has been reset, so there will be a very large
"avail" size.
*Solution:* I didn't come up with a fix for this issue, seems there is no easy way to let another instance know this case and reset the dsnoop->slave_hw_ptr, could you please help?
Could you try topic/pcm-direct-resume branch on
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi thub .com%2Ftiwai%2Falsa-
lib&data=04%7C01%7Cshengjiu.wang%40nxp.com%7C95f97de3f2c840d
9853508d9fd2e79ea%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C
637819198319430045%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdat
a=WWX1ZlcQhJF3pHJdHPIH%2B0xG9o%2FjQnHG5fHDbKXwQwE%3D&r
eserved=0
Thanks, I push my test result in https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith ub.com%2Falsa-project%2Falsa-
lib%2Fissues%2F213&data=04%7C01%7Cshe
ngjiu.wang%40nxp.com%7Cf71e70640d1b40b66be508d9fdbb2ac2%7C686ea 1d3bc2b
4c6fa92cd99c5c301635%7C0%7C0%7C637819802581943763%7CUnknown%7 CTWFpbGZs
b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn 0%3D
%7C3000&sdata=fZ2ogNj2RDTv4DV8vgB71M2m0XtU8UhMiXEV1%2Bl
wUrQ%3D&
;reserved=0 Could you please review?
Please keep the discussion on ML.
I saw you have update the origin/topic/pcm-direct-resume branch, I test your latest change, it is more stable than before, but still meet once of the issue after overnight test, it it very very low possibility.
So I suggest if we need to do below change, shall we?
diff --git a/src/pcm/pcm_dsnoop.c b/src/pcm/pcm_dsnoop.c index 729ff447b41f..cc333b3f4384 100644 --- a/src/pcm/pcm_dsnoop.c +++ b/src/pcm/pcm_dsnoop.c @@ -134,14 +134,21 @@ static int snd_pcm_dsnoop_sync_ptr(snd_pcm_t *pcm) snd_pcm_sframes_t diff; int err;
- err = snd_pcm_direct_check_xrun(dsnoop, pcm); - if (err < 0) - return err; if (dsnoop->slowptr) snd_pcm_hwsync(dsnoop->spcm); old_slave_hw_ptr = dsnoop->slave_hw_ptr; snoop_timestamp(pcm); slave_hw_ptr = dsnoop->slave_hw_ptr; + /* + * FIXME: Move snd_pcm_direct_client_chk_xrun after getting the + * dsnoop->spcm->hw.ptr. If the snd_pcm_direct_slave_recover() + * of another instance happening before dsnoop->spcm->hw.ptr + * is got, then a wrong spcm->hw.ptr is got which cause a wrong + * 'diff' data later. + */ + err = snd_pcm_direct_check_xrun(dsnoop, pcm); + if (err < 0) + return err; diff = pcm_frame_diff(slave_hw_ptr, old_slave_hw_ptr, dsnoop->slave_boundary);
best regards wang shengjiu
On Thu, 10 Mar 2022 03:25:27 +0100, S.J. Wang wrote:
Hi
Hi Takashi Iwai, Jaroslav Kysela
We encountered an issue in the pcm_dsnoop use case, could you
please help to have a look?
*Issue description:* With two instances for dsnoop type device running in parallel,
after suspend/resume, one of the instances will be hung in memcpy because the very large copy size is obtained.
#3 0x0000ffffa78d5098 in snd_pcm_dsnoop_sync_ptr
(pcm=0xaaab06563da0)
at pcm_dsnoop.c:158 dsnoop = 0xaaab06563c20 slave_hw_ptr = 64 old_slave_hw_ptr = 533120 avail = *187651522444320*
- Reason analysis: *
The root cause that I analysis is that after suspend/resume, one instance will get the SND_PCM_STATE_SUSPENDED state from slave pcm
device,
then it will do snd_pcm_prepare() and snd_pcm_start(), which will reset the dsnoop->slave_hw_ptr and the hw_ptr of slave pcm device, then the state of this instance is correct. But another instance may not get the SND_PCM_STATE_SUSPENDED state from
slave
pcm device because slave device may have been recovered by first instance, so the dsnoop->slave_hw_ptr is not reset. but because hw_ptr of slave pcm device has been reset, so there will be a very large
"avail" size.
*Solution:* I didn't come up with a fix for this issue, seems there is no easy way to let another instance know this case and reset the dsnoop->slave_hw_ptr, could you please help?
Could you try topic/pcm-direct-resume branch on
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi thub .com%2Ftiwai%2Falsa-
lib&data=04%7C01%7Cshengjiu.wang%40nxp.com%7C95f97de3f2c840d
9853508d9fd2e79ea%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C
637819198319430045%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdat
a=WWX1ZlcQhJF3pHJdHPIH%2B0xG9o%2FjQnHG5fHDbKXwQwE%3D&r
eserved=0
Thanks, I push my test result in https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith ub.com%2Falsa-project%2Falsa-
lib%2Fissues%2F213&data=04%7C01%7Cshe
ngjiu.wang%40nxp.com%7Cf71e70640d1b40b66be508d9fdbb2ac2%7C686ea 1d3bc2b
4c6fa92cd99c5c301635%7C0%7C0%7C637819802581943763%7CUnknown%7 CTWFpbGZs
b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn 0%3D
%7C3000&sdata=fZ2ogNj2RDTv4DV8vgB71M2m0XtU8UhMiXEV1%2Bl
wUrQ%3D&
;reserved=0 Could you please review?
Please keep the discussion on ML.
I saw you have update the origin/topic/pcm-direct-resume branch, I test your latest change, it is more stable than before, but still meet once of the issue after overnight test, it it very very low possibility.
So I suggest if we need to do below change, shall we?
Point taken. The xrun/suspend check should be right before the slave hwptr update, yes.
I updated the git repo again. Will submit the patch set for the merge as the final version.
thanks,
Takashi
participants (2)
-
S.J. Wang
-
Takashi Iwai