Re: [alsa-devel] another locking issue with nonatomic ops?

16 Feb 2016


      On Tue, 16 Feb 2016 18:15:12 +0100,
Babu, Ramesh wrote:
...
...
On Tue, 16 Feb 2016 17:40:19 +0100,
Babu, Ramesh wrote:
...
...
On Tue, 16 Feb 2016 17:10:43 +0100,
Vinod Koul wrote:
...
On Tue, Feb 16, 2016 at 05:01:36PM +0100, Takashi Iwai wrote:
...
On Tue, 16 Feb 2016 16:37:11 +0100,
Vinod Koul wrote:
>
>
> Hi Takashi,
>
> We found another issue with non atomic ops.
>
> This occurs when we have snd_pcm_do_prepare being invoked
which
...
...
acquires
...
...
> the lock snd_pcm_link_rwsem twice, first in
snd_pcm_common_ioctl1,
...
...
and then again in
...
...
> dpcm_dapm_stream_event. Normally this is fine, but if we have
another stream
...
...
> get closed in between two read accesses we get stuck.
>
> First thread:
>
> [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1
0x20020004
...
...
> [  419.665946]  ffff8800704a3a60 00000000704a3a60
ffff88006d1fddd0
...
...
ffffffff8252ffd8
...
...
> [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff
ffffffff8252ffd8
...
...
> [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987
ffffffffffffffff
...
...
> [  419.683574] Call Trace:
> [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> [  419.692306]  [<ffffffff81ca6f6d>]
rwsem_down_read_failed+0xdd/0x130
...
...
> [  419.701050]  [<ffffffff814df454>]
call_rwsem_down_read_failed+0x14/0x30
...
...
> [  419.709758]  [<ffffffff810e793b>] ?
down_read_nested+0x5b/0x70
...
...
...
...
> [  419.709805]  [<ffffffff81adfda9>] ?
snd_pcm_stream_lock+0xa9/0x120
...
...
> [  419.723012]  [<ffffffff81adfda9>]
snd_pcm_stream_lock+0xa9/0x120
...
...
...
...
>
> < ============     2nd Read lock is acquired here, without
releasing
...
...
previous one.
...
...
>
> [  419.723026]  [<ffffffff81adfe4c>]
snd_pcm_stream_lock_irq+0x2c/0x30
...
...
> [  419.731801]  [<ffffffff81b01f5e>] ?
dpcm_dapm_stream_event+0xbe/0xd0
...
...
> [  419.740565]  [<ffffffff81b0079c>]
dpcm_set_fe_update_state+0x3c/0xb0
...
...
> [  419.749252]  [<ffffffff81b04514>]
dpcm_fe_dai_prepare+0xc4/0x150
...
...
...
...
> [  419.749301]  [<ffffffff81adf7eb>]
snd_pcm_do_prepare+0x1b/0x30
...
...
...
...
> [  419.758083]  [<ffffffff81adf3bf>]
snd_pcm_action_single+0x2f/0x70
...
...
...
...
> [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> [  419.766962]  [<ffffffff81adf577>] ?
snd_pcm_action_nonatomic+0x27/0x80
...
...
> [  419.775565]  [<ffffffff81adf5c6>]
snd_pcm_action_nonatomic+0x76/0x80
...
...
>
> < ============     1st Read lock is acquired here
>
> [  419.784419]  [<ffffffff81ae2c62>]
snd_pcm_common_ioctl1+0x802/0xd30
...
...
> [  419.784495]  [<ffffffff81ae332b>]
snd_pcm_playback_ioctl1+0x19b/0x280
...
...
> [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> [  419.801943]  [<ffffffff81ae3dca>]
snd_pcm_ioctl_compat+0x41a/0x770
...
...
> [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> [  419.819535]  [<ffffffff810eb696>] ?
trace_hardirqs_on_caller+0x16/0x210
...
...
> [  419.828306]  [<ffffffff814df4f8>] ?
trace_hardirqs_on_thunk+0x17/0x19
...
...
> [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
>
>
> Second thread:
> [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1
0x20020004
...
...
> [  419.543191]  ffff880170957cd8 0000000070957cd8
ffff88006de49ad0
...
...
ffffffff8252ffd8
...
...
> [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0
ffffffff00000001
...
...
> [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987
ffffffff8252ffc0
...
...
> [  419.569368] Call Trace:
> [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> [  419.578026]  [<ffffffff81ca71f5>]
rwsem_down_write_failed+0x235/0x450
...
...
> [  419.586884]  [<ffffffff81ca7005>] ?
rwsem_down_write_failed+0x45/0x450
...
...
> [  419.586941]  [<ffffffff814df483>]
call_rwsem_down_write_failed+0x13/0x20
...
...
> [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
>
> < ============     Write lock acquired here. If this occurs between
two read
...
...
> lock's then it's hangs
So what's wrong?  The two read locks are still running, and
snd_pcm_release() waits until these two read locks are freed.
The read double-locks are fine, and it's intended behavior.
Yes that is true
...
The problem is apparently not there, but somewhere else.
Any other thread or call involved?
The second backtrace above is the 2nd thread which does down_write.
That
...
gets hung as 1st read lock is acquired. Since write gets higer priorty, 2nd
read is also stuck and we are in deadlock.
down_read (1st acquired)
   				down_write (waiting due to
down_read)
...
down_read( 2nd,,, stuck)
Hmm, so down_read() takes care of the pending down_write()?
Then it's a problem.  Maybe the limitation of the current rwsem
implementation.
down_write() will make sure no further read locks are allowed, till it get
executed.
So here the 2nd down_read() will get blocked, since down_write has
higher
...
priority.
Well, the requirement of down_write() is to block further reads *once
when* it takes the lock.  It doesn't necessarily mean to block reads
*before* taking the write lock.
Below excerpts from LDD [1] says:
"Writers get priority; as soon as a writer tries to enter the critical section,
no readers will be allowed in until all writers have completed their work."
In your case, writer doesn't enter the critical section yet.  It's
trying to enter but blocked.
...
My understanding when writer *tries* to acquire lock, further read locks
will be blocked.  I think in my back trace, 2nd down_read() didn't get honored,
since the down_write() was attempting to enter critical section from different
thread.
Actually, this is pretty much depending on the implementation.  A
proper RT-wise rwsem may handle it differently, but the Linux standard
rwsem just queues a la FIFO, thus the writer blocks the rest.
Takashi