RE: [EXT] Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib

5 Sep 2024

      ...
...
...
...
Hi Takashi,
Thanks for your reply and suggestions. Finally we have found the root
cause.
...
...
...
Seems it's related to both drivers and alsa-lib.
When two dmix clients run in parallel we get two direct dmix instances.
1st dmix instance:
snd_pcm_dmix_open()
      snd_pcm_direct_initialize_slave()
              save_slave_setting()
Since the driver we are using has SND_PCM_INFO_RESUME flag,
dmix->spcm->info has this flag. Then this flag is cleared in
dmix->shmptr->s.info.
...
2nd dmix instance:
snd_pcm_dmix_open()
      snd_pcm_direct_open_secondary_client()
              copy_slave_setting()
2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn'
has this flag.
If 1st dmix instance resumes firstly it should implement recovery of
slave pcm in snd_pcm_direct_slave_recover(). Because 1st
dmix->spcm->info has
SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be called
correctly to resume slave pcm.
... and immediately stop the stream, then prepare and restart as a usual
restart.
...
However if 2nd dmix instance resumes firstly,
snd_pcm_resume(direct->spcm) will not be called because it's
spcm->info doesn't has SND_PCM_INFO_RESUME flag. The 1st dmix
instance
...
...
...
assumes someone else already did recovery so
snd_pcm_resume(direct->spcm) won't be called neither. In result the
slave pcm fails to resume.
Something wrong happening here, then.
In dmix, there is no hardware resume at all, but it's always a restart of the
stream.  The call of snd_pcm_resume() is only temporarily for
inconsistencies
...
...
that can be a problem on some drivers (IIRC dmaengine stuff).  That said,
dmix does a kind of fake resume, stops and restarts the stream cleanly on
the
...
...
first instance.  On the second instance, it's already recovered, hence it
bails
...
...
out.
If poll() hangs on the second instance, there can be some other problem.
Maybe the resume -> stop -> restart sequence doesn't work with your
driver
...
...
well?
Our dma driver will do PAUSE in system suspend and requires doing RESUME
in
...
system resume. Current problem is that snd_pcm_resume() is not called by
both
...
1st instance and 2nd instance.
That's weird.  Are you really testing with the latest alsa-lib code?
If application doesn't call snd_pcm_resume(), it means that the PCM
state isn't set to SUSPENDED, so it pretends as if still running.
Or if you mean that snd_pcm_resume() to the slave PCM isn't called
(even though snd_pcm_resume() is called for the dmix PCM), check
whether snd_pcm_direct_slave_recover() gets called, especially at the
point:
    /* some buggy drivers require the device resumed before prepared;
     * when a device has RESUME flag and is in SUSPENDED state,

resume
         * here but immediately drop to bring it to a sane active state.
         */
        if (state == SND_PCM_STATE_SUSPENDED &&
            (direct->spcm->info & SND_PCM_INFO_RESUME)) {
                snd_pcm_resume(direct->spcm);
                snd_pcm_drop(direct->spcm);
                snd_pcm_direct_timer_stop(direct);
                snd_pcm_direct_clear_timer_queue(direct);
        }
Try to put debug prints or catch via breakpoint whether this code path
is executed.
Also, does the issue happen with the latest 6.11-rc kernel, too?
If yes, what if you drop SNDRV_PCM_INFO_RESUME bit flag in the driver
side?  Does the problem persist, or it works?
I'm working on kernel 6.6 and alsa-lib v1.2.11. It's not so outdated I think and
then I will try to switch on the latest version.
Indeed I did some debug on this part. Please see my comments inline.
int snd_pcm_direct_slave_recover(snd_pcm_direct_t *direct)
{
    ...

    /* [Chancel]
     * When two dmix clients run in parallel we get two direct dmix instances.
     * 1st dmix->spcm->info has SND_PCM_INFO_RESUME flag but 2nd dmix doesn't.
     * Let's name 1st opened dmix "dmix1" and 2nd dmix "dmix2".
     * After resume, both dmix1 and dmix2 enter into snd_pcm_direct_slave_recover().
     * Here we assume dmix2 is the earlier instance which execute here.
     * dmix2 successfully get semaphore lock and dmix1 is waiting for this lock.
     */

    semerr = snd_pcm_direct_semaphore_down(direct,
    				   DIRECT_IPC_SEM_CLIENT);
    ...
    state = snd_pcm_state(direct->spcm);
    if (state != SND_PCM_STATE_XRUN && state != SND_PCM_STATE_SUSPENDED) {

    /* [Chancel]
     * dmix2 finds spcm state is SUSPENDED so it will not enter here.
     * However later when dmix1 get lock and enter here, spcm state has been changed to RUNNING by dmix2.
     * In result dmix1 assumes some other instance has done so dmix2 directly return.
     * snd_pcm_resume() is not called by dmix1.
     */

    	/* ignore... someone else already did recovery */
    	semerr = snd_pcm_direct_semaphore_up(direct,
    					     DIRECT_IPC_SEM_CLIENT);
    	if (semerr < 0) {
    		SNDERR("SEMUP FAILED with err %d", semerr);
    		return semerr;
    	}
return 0;
    }
    ...
if (state == SND_PCM_STATE_SUSPENDED &&
        (direct->spcm->info & SND_PCM_INFO_RESUME)) {

    /* [Chancel]
     * dmix2->spcm->info doesn't have SND_PCM_INFO_RESUME flag. So this condition is not met.
     * snd_pcm_resume() is not called by dmix2.
     */
snd_pcm_resume(direct->spcm);
    	snd_pcm_drop(direct->spcm);
    	snd_pcm_direct_timer_stop(direct);
    	snd_pcm_direct_clear_timer_queue(direct);
    }
    ...
    ret = snd_pcm_prepare(direct->spcm);
    ...

    /* [Chancel]
     * dmix2 calls snd_pcm_start to set spcm state to RUNNING.
     */

    ret = snd_pcm_start(direct->spcm);
    ...
}
The dma driver I'm using supports pause/resume function. I don't think dropping SNDRV_PCM_INFO_RESUME 
is a good fix on this issue. Besides this driver, I also validate on another driver whose dma doesn't
has such flag. This issue has gone and both 2 instances work well with suspend/resume.
Regards, 
Chancel Liu
...
...
...
...
SND_PCM_INFO_RESUME flag has impact on the flow of dmix resume. In
my
...
...
...
opinion the first resumed dmix instance should make sure slave pcm can
be recovered properly no matter it's the first opened instance or
secondary opened instance
.
The snd_pcm_resume() gets called no matter which instance, just the first
one
...
...
who tries to recover the suspended state.  (And it's called internally at
updating the various state, not necessarily an explicit recovery call.)
Unfortunately if secondary opened instance resumes first it doesn't has
SND_PCM_INFO_RESUME which causes snd_pcm_resume() never be called.
No, it's misunderstanding.  SND_PCM_INFO_RESUME isn't exposed to the
application in the case of dmix at all; i.e. dmix doesn't support the
full resume, per se. That's the design.  So it doesn't matter which
instance gets resumed at first.
...
...
...
Do you know why the secondary opened instance clear the
SND_PCM_INFO_RESUME flag? Can we do the following modification?
diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c @@ -1183,8
+1226,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix,
snd_pcm_t *spcm)
...
    COPY_SLAVE(buffer_time);
    COPY_SLAVE(sample_bits);
    COPY_SLAVE(frame_bits);

  dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;

I don't think so.  The clearance of the RESUME flag here is correct.
dmix doesn't support the hardware resume feature.  It does its own.
(And this flag is merely a info for apps, which isn't really evaluated except
for
...
...
the code in dmix workaround there.)
Takashi
I think dmix should know what state the real driver is. If driver requires that
app should do snd_pcm_resume() how can dmix get this information?
The dmix already knows.  But the PCM state exposed to applications
isn't always tied as 1:1.
Takashi