[alsa-devel] Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (full-duplex: latency.c)

Tue Sep 17 18:07:22 CEST 2013

I just tried to look a bit more into this:

> .... This means that - for
> some reason - both `_writei and `_readi` are *skipped* after the
> second "cardIRQ?"; and I cannot really tell what would be the reason
> for it. I may try to look into this further - but as I remember, the
> PortAudio full-duplex problem was not a typical XRUN, so maybe I'll
> go back directly to PortAudio from this point.
> 

... so I came up with another script, `comparecsv.pl`, posted again here:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/

The idea was to use it to compare that section (after the second "cardIRQ?") in a failing and non-failing acquisition of the modified `dummy` (*duM*) driver; so I've tried it as:

    perl comparecsv.pl \
      -i captlat-2013-09-11-00-37-48-duM-128-256-np-0-ns-yb/trace-dummy.csv -o 1580 \
      -i captlat-2013-09-11-13-46-43-duM-128-256-np-0-ns-yb-f/trace-dummy.csv -o 1054 \
    -l 844 > compcsv.txt

This generates a text-only side-by-side comparison of the two input files at given line number offsets and for the given length in lines:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/_cappics05/compcsv.txt

... and can be used to generate a more visual, HTML side-by-side comparison:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/_cappics05/compcsvLR.html

... although also `meld` could be used (see the `Readme` for more details). 

Unfortunately, these side-by-side displays are still difficult enough to read, to allow for some clear conclusions. 

Then I thought to compare a given failed capture, to all other (six) successful captures, by quickly traversing through the .csv's, and finding which kernel commands in the failed one, do not appear in the successful ones, using `awk`. However:

    $ awk -F, \
      'FNR==NR{if(!($8 in a)){a[$8]=0;b[$8]=sprintf("%s %s",$1,$4)}};FNR!=NR{if($8 in a){a[$8]++;}};END{for(i in a){if(a[i]==0){print i,b[i];}}}' \
      captlat-2013-09-11-17-14-02-duM-128-256-np-0-ns-yb-f/trace-dummy.csv captlat-2013-0*duM*b/*.csv

    snd_pcm_do_stop();  0.005481 lt-latency
    xrun()              0.004957 rsyslogd

... unfortunately, with this failed acquisition (2013-09-11-17-14-02*-f), I only get the effect (`xrun`, `snd_pcm_do_stop`) detected - not anything I can attribute to the cause. The other failed .csv acquisition is a bit more revealing:

    $ awk -F, \
      'FNR==NR{if(!($8 in a)){a[$8]=0;b[$8]=sprintf("%s %s",$1,$4)}};FNR!=NR{if($8 in a){a[$8]++;}};END{for(i in a){if(a[i]==0){print i,b[i];}}}' \
      captlat-2013-09-11-13-46-43-duM-128-256-np-0-ns-yb-f/trace-dummy.csv captlat-2013-0*duM*b/*.csv

    account_idle_ticks()    0.002878 <idle>       o
    raise_softirq()         0.000547 <idle>       m
    snd_pcm_do_stop();      0.005192 lt-latency
    kthread_should_stop();  0.000873 ksoftirqd/0  o
    rcu_needs_cpu()         0.000530 <idle>       m
    xrun()                  0.004868 rsyslogd
    wakeup_softirqd()       0.000548 <idle>       o

... where I've manually added whether the functions appear "o"nce or "m"ultiple times in the failed (2013-09-11-13-46-43*-f) acquisition `.csv`. RCU, I learned, refers to "read-copy-update" (a type of locking/synchronization mechanism, apparently), and the only thing I gather from this, is that the reference to `rcu_needs_cpu` and `account_idle_ticks` probably means that Linux at acquistion time decided to do some housekeeping, which possibly preempted the scheduled execution of writei/readi from userspace.

Looking back at the above *.html side-by-side comparison - noting it can only reveal differences between specific acquisitions, for those particular acquisitions and regions, one thing visible is that the `sys_ioctl`s (which should eventually call the `snd_pcm_lib_read1/write1`) are not called at all in the failed one - and neither are `snd_pcm_update_state`, `pick_next_task_fair` nor `schedule`. Also,

* successful has: check_preempt_curr -> resched_task (twice, but not that far away from each other)
*     failed has: check_preempt_curr -> check_preempt_wakeup -> update_curr -> wakeup_preempt_entity.clone.88

... which looks like it confirms the theory that preemption happened in the failed one - but does not reveal due to what. Then again, as seen previously, `check_preempt_wakeup` can also occur in successful captures. In the failed capture, I can also see the process `eog` doing a `_pollwait` quite a bit, but not sure if that could be the reason for triggering the xrun. 

Another possibly weird thing can be seen in the *.html file, at start:

( successful  vs.  failed ) 
1 	snd_pcm_update_hw_ptr0()    1 	snd_pcm_update_hw_ptr0()
2 	dummy_hrtimer_pointer()	
3 	dummy_pcm_pointer()	    2 	dummy_pcm_pointer()
                                    3 	dummy_hrtimer_pointer()

In the code, `dummy_pcm_pointer` calls `dummy_hrtimer_pointer` - however here, in the successful capture the order is inverted; but that seems to be an artifact of sorting - in the `.csv` capture, both *_hrtimer and *_pcm_pointer have the same timestamp of 0.003225. So that doesn't say much either :/

So, unfortunately, a quick look seems to be not enough to properly determine a reason for an xrun() based on kernel trace acquisitions - if anyone can point a more proper way, I'd really appreciate it. 

Thanks in advance for any comments,  
Cheers!