[alsa-devel] [PATCH - pcm 1/1] pcm: fix "unable to create IPC shm instance" in some case
From: Qing Cai caiqing@neusoft.com
As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" (i.e., number of processes attached to the shared memeory). If an application uses alsa-lib and invokes fork() at some point, there should be the following execution sequence: 1. execute the following statement: pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) (shm_nattch becomes 1) 2. invoke fork() in some thread. (shm_nattch becomes 2) 3. execute the following statement: pcm_direct.c:122: if (buf.shm_nattch == 1) 4. execute the following statement: pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) (As stated in manpage SHMGET(2), "When a new shared memory segment is created, its contents are initialized to zero values", so dmix->shmptr->magic is 0) 5. execute the following statements: pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) pcm_direct.c:133: return -EINVAL The above execution sequence will cause the following error: unable to create IPC shm instance This error causes multimedia application has no sound. This error rarely occurs, probability is about 1%. Because the first user of the shared memory will get that dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine if "we're the first user" is OK. Tests have been made 400+ times after this fix, and the issue no longer exists.
Signed-off-by: Qing Cai bsiice@msn.com Signed-off-by: Qing Cai caiqing@neusoft.com
diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c index fd3877c..93d6f3c 100644 --- a/src/pcm/pcm_direct.c +++ b/src/pcm/pcm_direct.c @@ -119,7 +119,7 @@ retryget: snd_pcm_direct_shm_discard(dmix); return err; } - if (buf.shm_nattch == 1) { /* we're the first user, clear the segment */ + if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) { /* we're the first user, clear the segment */ memset(dmix->shmptr, 0, sizeof(snd_pcm_direct_share_t)); if (dmix->ipc_gid >= 0) { buf.shm_perm.gid = dmix->ipc_gid;
On Thu, 03 Mar 2016 13:40:42 +0100, IceBsi wrote:
From: Qing Cai caiqing@neusoft.com
As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" (i.e., number of processes attached to the shared memeory). If an application uses alsa-lib and invokes fork() at some point, there should be the following execution sequence:
- execute the following statement: pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) (shm_nattch becomes 1)
- invoke fork() in some thread. (shm_nattch becomes 2)
- execute the following statement: pcm_direct.c:122: if (buf.shm_nattch == 1)
- execute the following statement: pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) (As stated in manpage SHMGET(2), "When a new shared memory segment is created, its contents are initialized to zero values", so dmix->shmptr->magic is 0)
- execute the following statements: pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) pcm_direct.c:133: return -EINVAL
The above execution sequence will cause the following error: unable to create IPC shm instance This error causes multimedia application has no sound. This error rarely occurs, probability is about 1%. Because the first user of the shared memory will get that dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine if "we're the first user" is OK. Tests have been made 400+ times after this fix, and the issue no longer exists.
I think this is still racy. Multiple users can grab the shmem at the very same time. Maybe it looks as if working just because both users behavior as the first user and do clear and initialize.
The check of bus.shm_nattach=1 should be fine, per se. The problem is the magic key check of the secondary. In the current code, as you pointed out, this may happen before the first client finishes the initialization.
One option would be to use some lock like pthread mutex. Another option would be to spin for a while until it gets a non-zero value. I guess the latter is more suitable for dmix & co.
Thoughts?
thanks,
Takashi
Signed-off-by: Qing Cai bsiice@msn.com Signed-off-by: Qing Cai caiqing@neusoft.com
diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c index fd3877c..93d6f3c 100644 --- a/src/pcm/pcm_direct.c +++ b/src/pcm/pcm_direct.c @@ -119,7 +119,7 @@ retryget: snd_pcm_direct_shm_discard(dmix); return err; }
- if (buf.shm_nattch == 1) { /* we're the first user, clear the segment */
- if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) { /* we're the first user, clear the segment */ memset(dmix->shmptr, 0, sizeof(snd_pcm_direct_share_t)); if (dmix->ipc_gid >= 0) { buf.shm_perm.gid = dmix->ipc_gid;
-- 2.7.2
On 03/03/2016 09:53 PM, Takashi Iwai wrote:
On Thu, 03 Mar 2016 13:40:42 +0100, IceBsi wrote:
From: Qing Cai caiqing@neusoft.com
As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" (i.e., number of processes attached to the shared memeory). If an application uses alsa-lib and invokes fork() at some point, there should be the following execution sequence:
- execute the following statement: pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) (shm_nattch becomes 1)
- invoke fork() in some thread. (shm_nattch becomes 2)
- execute the following statement: pcm_direct.c:122: if (buf.shm_nattch == 1)
- execute the following statement: pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) (As stated in manpage SHMGET(2), "When a new shared memory segment is created, its contents are initialized to zero values", so dmix->shmptr->magic is 0)
- execute the following statements: pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) pcm_direct.c:133: return -EINVAL
The above execution sequence will cause the following error: unable to create IPC shm instance This error causes multimedia application has no sound. This error rarely occurs, probability is about 1%. Because the first user of the shared memory will get that dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine if "we're the first user" is OK. Tests have been made 400+ times after this fix, and the issue no longer exists.
I think this is still racy. Multiple users can grab the shmem at the very same time. Maybe it looks as if working just because both users behavior as the first user and do clear and initialize.
I think this won't be a race condition. Since snd_pcm_direct_shm_create_or_connect() is protected by snd_pcm_direct_semaphore_down() and snd_pcm_direct_semaphore_up(), multiple processes won't do clear and initialization concurrently.
The check of bus.shm_nattach=1 should be fine, per se. The problem is the magic key check of the secondary. In the current code, as you pointed out, this may happen before the first client finishes the initialization.
I think the check of dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC is more robust. Assume there are two clients of shmem and shmem is initialized, if one of the clients exits, shm_nattach will become 1 and shmem may be initialized again. In the current code, because there is semaphore, magic key check won't happen before initialization of shmem. In the scenario that I want to tell, there is only one process, and only one thread doning the clear and initialization of shmem, and the fork() is invoked in another thread of non alsa-lib context (i.e., the fork() code is not in alsa-lib). If the fork() is invoked before shmat(), the problem won't happen; If the fork() is invoked after the check of bus.shm_nattach=1, the problem won't happen too. Only if the fork() is invoked just after shmat() and just before the check of bus.shm_nattach=1, the problem happens.
One option would be to use some lock like pthread mutex. Another option would be to spin for a while until it gets a non-zero value. I guess the latter is more suitable for dmix & co.
There is already semaphore encapsulated by snd_pcm_direct_semaphore_down() and snd_pcm_direct_semaphore_up(), so mutex is not needed. In the scenario that I want to tell, it will never get a non-zero value of dmix->shmptr->magic, because there is no other thread doing the initialization.
Thanks, Qing Cai
Thoughts?
thanks,
Takashi
Signed-off-by: Qing Cai bsiice@msn.com Signed-off-by: Qing Cai caiqing@neusoft.com
diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c index fd3877c..93d6f3c 100644 --- a/src/pcm/pcm_direct.c +++ b/src/pcm/pcm_direct.c @@ -119,7 +119,7 @@ retryget: snd_pcm_direct_shm_discard(dmix); return err; }
- if (buf.shm_nattch == 1) { /* we're the first user, clear the segment */
- if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) { /* we're the first user, clear the segment */ memset(dmix->shmptr, 0, sizeof(snd_pcm_direct_share_t)); if (dmix->ipc_gid >= 0) { buf.shm_perm.gid = dmix->ipc_gid;
-- 2.7.2
On Fri, 04 Mar 2016 17:22:58 +0100, bsiice wrote:
On 03/03/2016 09:53 PM, Takashi Iwai wrote:
On Thu, 03 Mar 2016 13:40:42 +0100, IceBsi wrote:
From: Qing Cai caiqing@neusoft.com
As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" (i.e., number of processes attached to the shared memeory). If an application uses alsa-lib and invokes fork() at some point, there should be the following execution sequence:
- execute the following statement: pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) (shm_nattch becomes 1)
- invoke fork() in some thread. (shm_nattch becomes 2)
- execute the following statement: pcm_direct.c:122: if (buf.shm_nattch == 1)
- execute the following statement: pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) (As stated in manpage SHMGET(2), "When a new shared memory segment is created, its contents are initialized to zero values", so dmix->shmptr->magic is 0)
- execute the following statements: pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) pcm_direct.c:133: return -EINVAL
The above execution sequence will cause the following error: unable to create IPC shm instance This error causes multimedia application has no sound. This error rarely occurs, probability is about 1%. Because the first user of the shared memory will get that dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine if "we're the first user" is OK. Tests have been made 400+ times after this fix, and the issue no longer exists.
I think this is still racy. Multiple users can grab the shmem at the very same time. Maybe it looks as if working just because both users behavior as the first user and do clear and initialize.
I think this won't be a race condition. Since snd_pcm_direct_shm_create_or_connect() is protected by snd_pcm_direct_semaphore_down() and snd_pcm_direct_semaphore_up(), multiple processes won't do clear and initialization concurrently.
Hrm, OK, now understood the situation.
The check of bus.shm_nattach=1 should be fine, per se. The problem is the magic key check of the secondary. In the current code, as you pointed out, this may happen before the first client finishes the initialization.
I think the check of dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC is more robust. Assume there are two clients of shmem and shmem is initialized, if one of the clients exits, shm_nattach will become 1 and shmem may be initialized again.
Yeah, in the fork from a thread, the shm_nattch check doesn't work reliably, indeed.
In the current code, because there is semaphore, magic key check won't happen before initialization of shmem. In the scenario that I want to tell, there is only one process, and only one thread doning the clear and initialization of shmem, and the fork() is invoked in another thread of non alsa-lib context (i.e., the fork() code is not in alsa-lib). If the fork() is invoked before shmat(), the problem won't happen; If the fork() is invoked after the check of bus.shm_nattach=1, the problem won't happen too. Only if the fork() is invoked just after shmat() and just before the check of bus.shm_nattach=1, the problem happens.
Well, the reason why the magic key check was introduced was about the safety. Since the shmid is given explicitly, it might override some shmem usages other than dmix accidentally. With your patch, it forcibly clears the region -- this is quite opposite to the intention of the magic key check.
So, this isn't about the race. But how to not destroy something else unexpectedly.
I'd happily apply a good fix if there is a better alternative...
thanks,
Takashi
On 03/05/2016 12:37 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 17:22:58 +0100, bsiice wrote:
On 03/03/2016 09:53 PM, Takashi Iwai wrote:
On Thu, 03 Mar 2016 13:40:42 +0100, IceBsi wrote:
From: Qing Cai caiqing@neusoft.com
As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" (i.e., number of processes attached to the shared memeory). If an application uses alsa-lib and invokes fork() at some point, there should be the following execution sequence:
- execute the following statement: pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) (shm_nattch becomes 1)
- invoke fork() in some thread. (shm_nattch becomes 2)
- execute the following statement: pcm_direct.c:122: if (buf.shm_nattch == 1)
- execute the following statement: pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) (As stated in manpage SHMGET(2), "When a new shared memory segment is created, its contents are initialized to zero values", so dmix->shmptr->magic is 0)
- execute the following statements: pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) pcm_direct.c:133: return -EINVAL
The above execution sequence will cause the following error: unable to create IPC shm instance This error causes multimedia application has no sound. This error rarely occurs, probability is about 1%. Because the first user of the shared memory will get that dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine if "we're the first user" is OK. Tests have been made 400+ times after this fix, and the issue no longer exists.
I think this is still racy. Multiple users can grab the shmem at the very same time. Maybe it looks as if working just because both users behavior as the first user and do clear and initialize.
I think this won't be a race condition. Since snd_pcm_direct_shm_create_or_connect() is protected by snd_pcm_direct_semaphore_down() and snd_pcm_direct_semaphore_up(), multiple processes won't do clear and initialization concurrently.
Hrm, OK, now understood the situation.
The check of bus.shm_nattach=1 should be fine, per se. The problem is the magic key check of the secondary. In the current code, as you pointed out, this may happen before the first client finishes the initialization.
I think the check of dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC is more robust. Assume there are two clients of shmem and shmem is initialized, if one of the clients exits, shm_nattach will become 1 and shmem may be initialized again.
Yeah, in the fork from a thread, the shm_nattch check doesn't work reliably, indeed.
In the current code, because there is semaphore, magic key check won't happen before initialization of shmem. In the scenario that I want to tell, there is only one process, and only one thread doning the clear and initialization of shmem, and the fork() is invoked in another thread of non alsa-lib context (i.e., the fork() code is not in alsa-lib). If the fork() is invoked before shmat(), the problem won't happen; If the fork() is invoked after the check of bus.shm_nattach=1, the problem won't happen too. Only if the fork() is invoked just after shmat() and just before the check of bus.shm_nattach=1, the problem happens.
Well, the reason why the magic key check was introduced was about the safety. Since the shmid is given explicitly, it might override some shmem usages other than dmix accidentally. With your patch, it forcibly clears the region -- this is quite opposite to the intention of the magic key check.
I just think it clears the region only once at the first time. Could you please show me the scenario in detail?
thanks,
Qing Cai
So, this isn't about the race. But how to not destroy something else unexpectedly.
I'd happily apply a good fix if there is a better alternative...
thanks,
Takashi
On Fri, 04 Mar 2016 18:36:46 +0100, bsiice wrote:
On 03/05/2016 12:37 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 17:22:58 +0100, bsiice wrote:
On 03/03/2016 09:53 PM, Takashi Iwai wrote:
On Thu, 03 Mar 2016 13:40:42 +0100, IceBsi wrote:
From: Qing Cai caiqing@neusoft.com
As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" (i.e., number of processes attached to the shared memeory). If an application uses alsa-lib and invokes fork() at some point, there should be the following execution sequence:
- execute the following statement: pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) (shm_nattch becomes 1)
- invoke fork() in some thread. (shm_nattch becomes 2)
- execute the following statement: pcm_direct.c:122: if (buf.shm_nattch == 1)
- execute the following statement: pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) (As stated in manpage SHMGET(2), "When a new shared memory segment is created, its contents are initialized to zero values", so dmix->shmptr->magic is 0)
- execute the following statements: pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) pcm_direct.c:133: return -EINVAL
The above execution sequence will cause the following error: unable to create IPC shm instance This error causes multimedia application has no sound. This error rarely occurs, probability is about 1%. Because the first user of the shared memory will get that dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine if "we're the first user" is OK. Tests have been made 400+ times after this fix, and the issue no longer exists.
I think this is still racy. Multiple users can grab the shmem at the very same time. Maybe it looks as if working just because both users behavior as the first user and do clear and initialize.
I think this won't be a race condition. Since snd_pcm_direct_shm_create_or_connect() is protected by snd_pcm_direct_semaphore_down() and snd_pcm_direct_semaphore_up(), multiple processes won't do clear and initialization concurrently.
Hrm, OK, now understood the situation.
The check of bus.shm_nattach=1 should be fine, per se. The problem is the magic key check of the secondary. In the current code, as you pointed out, this may happen before the first client finishes the initialization.
I think the check of dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC is more robust. Assume there are two clients of shmem and shmem is initialized, if one of the clients exits, shm_nattach will become 1 and shmem may be initialized again.
Yeah, in the fork from a thread, the shm_nattch check doesn't work reliably, indeed.
In the current code, because there is semaphore, magic key check won't happen before initialization of shmem. In the scenario that I want to tell, there is only one process, and only one thread doning the clear and initialization of shmem, and the fork() is invoked in another thread of non alsa-lib context (i.e., the fork() code is not in alsa-lib). If the fork() is invoked before shmat(), the problem won't happen; If the fork() is invoked after the check of bus.shm_nattach=1, the problem won't happen too. Only if the fork() is invoked just after shmat() and just before the check of bus.shm_nattach=1, the problem happens.
Well, the reason why the magic key check was introduced was about the safety. Since the shmid is given explicitly, it might override some shmem usages other than dmix accidentally. With your patch, it forcibly clears the region -- this is quite opposite to the intention of the magic key check.
I just think it clears the region only once at the first time. Could you please show me the scenario in detail?
The shmid ipc key is given explicitly by an alsa-lib configuration, but there is no guarantee that this key has never been used by some other programs for a completely different purpose. So, for non-first user, it verifies whether the attached region really belongs to alsa-lib dmix.
Takashi
On 03/05/2016 02:08 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 18:36:46 +0100, bsiice wrote:
On 03/05/2016 12:37 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 17:22:58 +0100, bsiice wrote:
On 03/03/2016 09:53 PM, Takashi Iwai wrote:
On Thu, 03 Mar 2016 13:40:42 +0100, IceBsi wrote:
From: Qing Cai caiqing@neusoft.com
As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" (i.e., number of processes attached to the shared memeory). If an application uses alsa-lib and invokes fork() at some point, there should be the following execution sequence:
- execute the following statement: pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) (shm_nattch becomes 1)
- invoke fork() in some thread. (shm_nattch becomes 2)
- execute the following statement: pcm_direct.c:122: if (buf.shm_nattch == 1)
- execute the following statement: pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) (As stated in manpage SHMGET(2), "When a new shared memory segment is created, its contents are initialized to zero values", so dmix->shmptr->magic is 0)
- execute the following statements: pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) pcm_direct.c:133: return -EINVAL
The above execution sequence will cause the following error: unable to create IPC shm instance This error causes multimedia application has no sound. This error rarely occurs, probability is about 1%. Because the first user of the shared memory will get that dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine if "we're the first user" is OK. Tests have been made 400+ times after this fix, and the issue no longer exists.
I think this is still racy. Multiple users can grab the shmem at the very same time. Maybe it looks as if working just because both users behavior as the first user and do clear and initialize.
I think this won't be a race condition. Since snd_pcm_direct_shm_create_or_connect() is protected by snd_pcm_direct_semaphore_down() and snd_pcm_direct_semaphore_up(), multiple processes won't do clear and initialization concurrently.
Hrm, OK, now understood the situation.
The check of bus.shm_nattach=1 should be fine, per se. The problem is the magic key check of the secondary. In the current code, as you pointed out, this may happen before the first client finishes the initialization.
I think the check of dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC is more robust. Assume there are two clients of shmem and shmem is initialized, if one of the clients exits, shm_nattach will become 1 and shmem may be initialized again.
Yeah, in the fork from a thread, the shm_nattch check doesn't work reliably, indeed.
In the current code, because there is semaphore, magic key check won't happen before initialization of shmem. In the scenario that I want to tell, there is only one process, and only one thread doning the clear and initialization of shmem, and the fork() is invoked in another thread of non alsa-lib context (i.e., the fork() code is not in alsa-lib). If the fork() is invoked before shmat(), the problem won't happen; If the fork() is invoked after the check of bus.shm_nattach=1, the problem won't happen too. Only if the fork() is invoked just after shmat() and just before the check of bus.shm_nattach=1, the problem happens.
Well, the reason why the magic key check was introduced was about the safety. Since the shmid is given explicitly, it might override some shmem usages other than dmix accidentally. With your patch, it forcibly clears the region -- this is quite opposite to the intention of the magic key check.
I just think it clears the region only once at the first time. Could you please show me the scenario in detail?
The shmid ipc key is given explicitly by an alsa-lib configuration, but there is no guarantee that this key has never been used by some other programs for a completely different purpose. So, for non-first user, it verifies whether the attached region really belongs to alsa-lib dmix.
Takashi
Thank you for explanation! Now I understood. I have tried changing ipc_key many times when I have been investigating the problem, and changing ipc_key doesn't solve the problem I encounterd.
I think there is more benefits to do magic key check instead of shm_nattch check, because the situation that I encounterd happens more often than the situation that other program has a same ipc key with alsa-lib. The problem left is to make ipc key as unique as possible. Every program/library should fulfil the responsibility of generating a unique key. As stated on http://www.tldp.org/LDP/lpg/node24.html, ftok() should be used to generate a unique key.
thanks,
Qing Cai
On Sat, 05 Mar 2016 07:37:34 +0100, bsiice wrote:
On 03/05/2016 02:08 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 18:36:46 +0100, bsiice wrote:
On 03/05/2016 12:37 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 17:22:58 +0100, bsiice wrote:
On 03/03/2016 09:53 PM, Takashi Iwai wrote:
On Thu, 03 Mar 2016 13:40:42 +0100, IceBsi wrote: > > From: Qing Cai caiqing@neusoft.com > > As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" > (i.e., number of processes attached to the shared memeory). If an > application uses alsa-lib and invokes fork() at some point, there should > be the following execution sequence: > 1. execute the following statement: > pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) > (shm_nattch becomes 1) > 2. invoke fork() in some thread. > (shm_nattch becomes 2) > 3. execute the following statement: > pcm_direct.c:122: if (buf.shm_nattch == 1) > 4. execute the following statement: > pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) > (As stated in manpage SHMGET(2), "When a new shared memory segment > is created, its contents are initialized to zero values", so > dmix->shmptr->magic is 0) > 5. execute the following statements: > pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) > pcm_direct.c:133: return -EINVAL > The above execution sequence will cause the following error: > unable to create IPC shm instance > This error causes multimedia application has no sound. This error rarely > occurs, probability is about 1%. > Because the first user of the shared memory will get that > dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine > if "we're the first user" is OK. > Tests have been made 400+ times after this fix, and the issue no longer > exists.
I think this is still racy. Multiple users can grab the shmem at the very same time. Maybe it looks as if working just because both users behavior as the first user and do clear and initialize.
I think this won't be a race condition. Since snd_pcm_direct_shm_create_or_connect() is protected by snd_pcm_direct_semaphore_down() and snd_pcm_direct_semaphore_up(), multiple processes won't do clear and initialization concurrently.
Hrm, OK, now understood the situation.
The check of bus.shm_nattach=1 should be fine, per se. The problem is the magic key check of the secondary. In the current code, as you pointed out, this may happen before the first client finishes the initialization.
I think the check of dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC is more robust. Assume there are two clients of shmem and shmem is initialized, if one of the clients exits, shm_nattach will become 1 and shmem may be initialized again.
Yeah, in the fork from a thread, the shm_nattch check doesn't work reliably, indeed.
In the current code, because there is semaphore, magic key check won't happen before initialization of shmem. In the scenario that I want to tell, there is only one process, and only one thread doning the clear and initialization of shmem, and the fork() is invoked in another thread of non alsa-lib context (i.e., the fork() code is not in alsa-lib). If the fork() is invoked before shmat(), the problem won't happen; If the fork() is invoked after the check of bus.shm_nattach=1, the problem won't happen too. Only if the fork() is invoked just after shmat() and just before the check of bus.shm_nattach=1, the problem happens.
Well, the reason why the magic key check was introduced was about the safety. Since the shmid is given explicitly, it might override some shmem usages other than dmix accidentally. With your patch, it forcibly clears the region -- this is quite opposite to the intention of the magic key check.
I just think it clears the region only once at the first time. Could you please show me the scenario in detail?
The shmid ipc key is given explicitly by an alsa-lib configuration, but there is no guarantee that this key has never been used by some other programs for a completely different purpose. So, for non-first user, it verifies whether the attached region really belongs to alsa-lib dmix.
Takashi
Thank you for explanation! Now I understood. I have tried changing ipc_key many times when I have been investigating the problem, and changing ipc_key doesn't solve the problem I encounterd.
I think there is more benefits to do magic key check instead of shm_nattch check, because the situation that I encounterd happens more often than the situation that other program has a same ipc key with alsa-lib.
Yes, possibly. However, which may cause a severe problem? It's the question, too.
The problem left is to make ipc key as unique as possible. Every program/library should fulfil the responsibility of generating a unique key. As stated on http://www.tldp.org/LDP/lpg/node24.html, ftok() should be used to generate a unique key.
This doesn't guarantee the uniqueness completely, obviously :) In addition, we'd need more than 256 variants...
thanks,
Takashi
On 03/05/2016 03:36 PM, Takashi Iwai wrote:
On Sat, 05 Mar 2016 07:37:34 +0100, bsiice wrote:
On 03/05/2016 02:08 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 18:36:46 +0100, bsiice wrote:
On 03/05/2016 12:37 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 17:22:58 +0100, bsiice wrote:
On 03/03/2016 09:53 PM, Takashi Iwai wrote: > On Thu, 03 Mar 2016 13:40:42 +0100, > IceBsi wrote: >> >> From: Qing Cai caiqing@neusoft.com >> >> As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" >> (i.e., number of processes attached to the shared memeory). If an >> application uses alsa-lib and invokes fork() at some point, there should >> be the following execution sequence: >> 1. execute the following statement: >> pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) >> (shm_nattch becomes 1) >> 2. invoke fork() in some thread. >> (shm_nattch becomes 2) >> 3. execute the following statement: >> pcm_direct.c:122: if (buf.shm_nattch == 1) >> 4. execute the following statement: >> pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) >> (As stated in manpage SHMGET(2), "When a new shared memory segment >> is created, its contents are initialized to zero values", so >> dmix->shmptr->magic is 0) >> 5. execute the following statements: >> pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) >> pcm_direct.c:133: return -EINVAL >> The above execution sequence will cause the following error: >> unable to create IPC shm instance >> This error causes multimedia application has no sound. This error rarely >> occurs, probability is about 1%. >> Because the first user of the shared memory will get that >> dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine >> if "we're the first user" is OK. >> Tests have been made 400+ times after this fix, and the issue no longer >> exists. > > I think this is still racy. Multiple users can grab the shmem at the > very same time. Maybe it looks as if working just because both users > behavior as the first user and do clear and initialize. I think this won't be a race condition. Since snd_pcm_direct_shm_create_or_connect() is protected by snd_pcm_direct_semaphore_down() and snd_pcm_direct_semaphore_up(), multiple processes won't do clear and initialization concurrently.
Hrm, OK, now understood the situation.
> The check of bus.shm_nattach=1 should be fine, per se. The problem is > the magic key check of the secondary. In the current code, as you > pointed out, this may happen before the first client finishes the > initialization. I think the check of dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC is more robust. Assume there are two clients of shmem and shmem is initialized, if one of the clients exits, shm_nattach will become 1 and shmem may be initialized again.
Yeah, in the fork from a thread, the shm_nattch check doesn't work reliably, indeed.
In the current code, because there is semaphore, magic key check won't happen before initialization of shmem. In the scenario that I want to tell, there is only one process, and only one thread doning the clear and initialization of shmem, and the fork() is invoked in another thread of non alsa-lib context (i.e., the fork() code is not in alsa-lib). If the fork() is invoked before shmat(), the problem won't happen; If the fork() is invoked after the check of bus.shm_nattach=1, the problem won't happen too. Only if the fork() is invoked just after shmat() and just before the check of bus.shm_nattach=1, the problem happens.
Well, the reason why the magic key check was introduced was about the safety. Since the shmid is given explicitly, it might override some shmem usages other than dmix accidentally. With your patch, it forcibly clears the region -- this is quite opposite to the intention of the magic key check.
I just think it clears the region only once at the first time. Could you please show me the scenario in detail?
The shmid ipc key is given explicitly by an alsa-lib configuration, but there is no guarantee that this key has never been used by some other programs for a completely different purpose. So, for non-first user, it verifies whether the attached region really belongs to alsa-lib dmix.
Takashi
Thank you for explanation! Now I understood. I have tried changing ipc_key many times when I have been investigating the problem, and changing ipc_key doesn't solve the problem I encounterd.
I think there is more benefits to do magic key check instead of shm_nattch check, because the situation that I encounterd happens more often than the situation that other program has a same ipc key with alsa-lib.
Yes, possibly. However, which may cause a severe problem? It's the question, too.
By 'severe problem', do you mean that alsa-lib may clear other program's shmem? If the key is unique, will the 'severe problem' still happen?
The problem left is to make ipc key as unique as possible. Every program/library should fulfil the responsibility of generating a unique key. As stated on http://www.tldp.org/LDP/lpg/node24.html, ftok() should be used to generate a unique key.
This doesn't guarantee the uniqueness completely, obviously :) In addition, we'd need more than 256 variants...
Oh, sorry, I thought it would be unique if all program/library use their own full filename as arg to ftok(). Actually this doesn't guarantee uniqueness. :)
What if alsa-lib changed to use POSIX shmem (i.e. shm_open()...)?
thanks,
Qing Cai
thanks,
Takashi
On Sat, 05 Mar 2016 09:35:51 +0100, bsiice wrote:
On 03/05/2016 03:36 PM, Takashi Iwai wrote:
On Sat, 05 Mar 2016 07:37:34 +0100, bsiice wrote:
On 03/05/2016 02:08 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 18:36:46 +0100, bsiice wrote:
On 03/05/2016 12:37 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 17:22:58 +0100, bsiice wrote: > > On 03/03/2016 09:53 PM, Takashi Iwai wrote: >> On Thu, 03 Mar 2016 13:40:42 +0100, >> IceBsi wrote: >>> >>> From: Qing Cai caiqing@neusoft.com >>> >>> As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" >>> (i.e., number of processes attached to the shared memeory). If an >>> application uses alsa-lib and invokes fork() at some point, there should >>> be the following execution sequence: >>> 1. execute the following statement: >>> pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) >>> (shm_nattch becomes 1) >>> 2. invoke fork() in some thread. >>> (shm_nattch becomes 2) >>> 3. execute the following statement: >>> pcm_direct.c:122: if (buf.shm_nattch == 1) >>> 4. execute the following statement: >>> pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) >>> (As stated in manpage SHMGET(2), "When a new shared memory segment >>> is created, its contents are initialized to zero values", so >>> dmix->shmptr->magic is 0) >>> 5. execute the following statements: >>> pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) >>> pcm_direct.c:133: return -EINVAL >>> The above execution sequence will cause the following error: >>> unable to create IPC shm instance >>> This error causes multimedia application has no sound. This error rarely >>> occurs, probability is about 1%. >>> Because the first user of the shared memory will get that >>> dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine >>> if "we're the first user" is OK. >>> Tests have been made 400+ times after this fix, and the issue no longer >>> exists. >> >> I think this is still racy. Multiple users can grab the shmem at the >> very same time. Maybe it looks as if working just because both users >> behavior as the first user and do clear and initialize. > I think this won't be a race condition. Since > snd_pcm_direct_shm_create_or_connect() is protected by > snd_pcm_direct_semaphore_down() and snd_pcm_direct_semaphore_up(), > multiple processes won't do clear and initialization concurrently.
Hrm, OK, now understood the situation.
>> The check of bus.shm_nattach=1 should be fine, per se. The problem is >> the magic key check of the secondary. In the current code, as you >> pointed out, this may happen before the first client finishes the >> initialization. > I think the check of dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC is more > robust. Assume there are two clients of shmem and shmem is initialized, > if one of the clients exits, shm_nattach will become 1 and shmem may be > initialized again.
Yeah, in the fork from a thread, the shm_nattch check doesn't work reliably, indeed.
> In the current code, because there is semaphore, magic key check won't > happen before initialization of shmem. > In the scenario that I want to tell, there is only one process, and only > one thread doning the clear and initialization of shmem, and the fork() > is invoked in another thread of non alsa-lib context (i.e., the fork() > code is not in alsa-lib). If the fork() is invoked before shmat(), the > problem won't happen; If the fork() is invoked after the check of > bus.shm_nattach=1, the problem won't happen too. Only if the fork() is > invoked just after shmat() and just before the check of > bus.shm_nattach=1, the problem happens.
Well, the reason why the magic key check was introduced was about the safety. Since the shmid is given explicitly, it might override some shmem usages other than dmix accidentally. With your patch, it forcibly clears the region -- this is quite opposite to the intention of the magic key check.
I just think it clears the region only once at the first time. Could you please show me the scenario in detail?
The shmid ipc key is given explicitly by an alsa-lib configuration, but there is no guarantee that this key has never been used by some other programs for a completely different purpose. So, for non-first user, it verifies whether the attached region really belongs to alsa-lib dmix.
Takashi
Thank you for explanation! Now I understood. I have tried changing ipc_key many times when I have been investigating the problem, and changing ipc_key doesn't solve the problem I encounterd.
I think there is more benefits to do magic key check instead of shm_nattch check, because the situation that I encounterd happens more often than the situation that other program has a same ipc key with alsa-lib.
Yes, possibly. However, which may cause a severe problem? It's the question, too.
By 'severe problem', do you mean that alsa-lib may clear other program's shmem? If the key is unique, will the 'severe problem' still happen?
If it were unique... Actually it's not with SYSV shmem.
The problem left is to make ipc key as unique as possible. Every program/library should fulfil the responsibility of generating a unique key. As stated on http://www.tldp.org/LDP/lpg/node24.html, ftok() should be used to generate a unique key.
This doesn't guarantee the uniqueness completely, obviously :) In addition, we'd need more than 256 variants...
Oh, sorry, I thought it would be unique if all program/library use their own full filename as arg to ftok(). Actually this doesn't guarantee uniqueness. :)
What if alsa-lib changed to use POSIX shmem (i.e. shm_open()...)?
This would be one option. I thought it in the past, too, but postponed by some reason I forgot.
Another possible way I can think of is to determine the first user by shmget() call without IPC_CREAT. If this succeeds, we are no first user. I'm not sure, though, whether this method is more robust.
Takashi
On 03/05/2016 06:07 PM, Takashi Iwai wrote:
On Sat, 05 Mar 2016 09:35:51 +0100, bsiice wrote:
On 03/05/2016 03:36 PM, Takashi Iwai wrote:
On Sat, 05 Mar 2016 07:37:34 +0100, bsiice wrote:
On 03/05/2016 02:08 AM, Takashi Iwai wrote:
On Fri, 04 Mar 2016 18:36:46 +0100, bsiice wrote:
On 03/05/2016 12:37 AM, Takashi Iwai wrote: > On Fri, 04 Mar 2016 17:22:58 +0100, > bsiice wrote: >> >> On 03/03/2016 09:53 PM, Takashi Iwai wrote: >>> On Thu, 03 Mar 2016 13:40:42 +0100, >>> IceBsi wrote: >>>> >>>> From: Qing Cai caiqing@neusoft.com >>>> >>>> As stated in manpage SHMCTL(2), shm_nattch is "No. of current attaches" >>>> (i.e., number of processes attached to the shared memeory). If an >>>> application uses alsa-lib and invokes fork() at some point, there should >>>> be the following execution sequence: >>>> 1. execute the following statement: >>>> pcm_direct.c:110: dmix->shmptr = shmat(dmix->shmid, 0, 0) >>>> (shm_nattch becomes 1) >>>> 2. invoke fork() in some thread. >>>> (shm_nattch becomes 2) >>>> 3. execute the following statement: >>>> pcm_direct.c:122: if (buf.shm_nattch == 1) >>>> 4. execute the following statement: >>>> pcm_direct.c:131: if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) >>>> (As stated in manpage SHMGET(2), "When a new shared memory segment >>>> is created, its contents are initialized to zero values", so >>>> dmix->shmptr->magic is 0) >>>> 5. execute the following statements: >>>> pcm_direct.c:132: snd_pcm_direct_shm_discard(dmix) >>>> pcm_direct.c:133: return -EINVAL >>>> The above execution sequence will cause the following error: >>>> unable to create IPC shm instance >>>> This error causes multimedia application has no sound. This error rarely >>>> occurs, probability is about 1%. >>>> Because the first user of the shared memory will get that >>>> dmix->shmptr->magic is 0, check dmix->shmptr->magic's value to determine >>>> if "we're the first user" is OK. >>>> Tests have been made 400+ times after this fix, and the issue no longer >>>> exists. >>> >>> I think this is still racy. Multiple users can grab the shmem at the >>> very same time. Maybe it looks as if working just because both users >>> behavior as the first user and do clear and initialize. >> I think this won't be a race condition. Since >> snd_pcm_direct_shm_create_or_connect() is protected by >> snd_pcm_direct_semaphore_down() and snd_pcm_direct_semaphore_up(), >> multiple processes won't do clear and initialization concurrently. > > Hrm, OK, now understood the situation. > >>> The check of bus.shm_nattach=1 should be fine, per se. The problem is >>> the magic key check of the secondary. In the current code, as you >>> pointed out, this may happen before the first client finishes the >>> initialization. >> I think the check of dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC is more >> robust. Assume there are two clients of shmem and shmem is initialized, >> if one of the clients exits, shm_nattach will become 1 and shmem may be >> initialized again. > > Yeah, in the fork from a thread, the shm_nattch check doesn't work > reliably, indeed. > >> In the current code, because there is semaphore, magic key check won't >> happen before initialization of shmem. >> In the scenario that I want to tell, there is only one process, and only >> one thread doning the clear and initialization of shmem, and the fork() >> is invoked in another thread of non alsa-lib context (i.e., the fork() >> code is not in alsa-lib). If the fork() is invoked before shmat(), the >> problem won't happen; If the fork() is invoked after the check of >> bus.shm_nattach=1, the problem won't happen too. Only if the fork() is >> invoked just after shmat() and just before the check of >> bus.shm_nattach=1, the problem happens. > > Well, the reason why the magic key check was introduced was about the > safety. Since the shmid is given explicitly, it might override some > shmem usages other than dmix accidentally. With your patch, it > forcibly clears the region -- this is quite opposite to the intention > of the magic key check.
I just think it clears the region only once at the first time. Could you please show me the scenario in detail?
The shmid ipc key is given explicitly by an alsa-lib configuration, but there is no guarantee that this key has never been used by some other programs for a completely different purpose. So, for non-first user, it verifies whether the attached region really belongs to alsa-lib dmix.
Takashi
Thank you for explanation! Now I understood. I have tried changing ipc_key many times when I have been investigating the problem, and changing ipc_key doesn't solve the problem I encounterd.
I think there is more benefits to do magic key check instead of shm_nattch check, because the situation that I encounterd happens more often than the situation that other program has a same ipc key with alsa-lib.
Yes, possibly. However, which may cause a severe problem? It's the question, too.
By 'severe problem', do you mean that alsa-lib may clear other program's shmem? If the key is unique, will the 'severe problem' still happen?
If it were unique... Actually it's not with SYSV shmem.
The problem left is to make ipc key as unique as possible. Every program/library should fulfil the responsibility of generating a unique key. As stated on http://www.tldp.org/LDP/lpg/node24.html, ftok() should be used to generate a unique key.
This doesn't guarantee the uniqueness completely, obviously :) In addition, we'd need more than 256 variants...
Oh, sorry, I thought it would be unique if all program/library use their own full filename as arg to ftok(). Actually this doesn't guarantee uniqueness. :)
What if alsa-lib changed to use POSIX shmem (i.e. shm_open()...)?
This would be one option. I thought it in the past, too, but postponed by some reason I forgot.
Another possible way I can think of is to determine the first user by shmget() call without IPC_CREAT. If this succeeds, we are no first user. I'm not sure, though, whether this method is more robust.
Sounds a good method, I think it can't be impacted by fork() from a thread.
Assume there is other program/process uses same ipc key to create shmem, and runs concurrently with alsa-lib, I can think of some methods and situations (I only think of situations with failure).
Method 1: call shmget() without IPC_CREAT firstly, if fail then call shmget(IPC_CREAT). Possible situation 1.1: (1) other program creates a shmem with same key (2) alsa-lib calls shmget() without IPC_CREAT (3) if (2) succeeds, alsa-lib connects to other program's shmem (4) alsa-lib is not the first user, so alsa-lib checks dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC (true) (5) alsa-lib return -EINVAL Possible situation 1.2: (1) alsa-lib calls shmget() without IPC_CREAT, and fails (2) other program creates a shmem with same key (3) alsa-lib calls shmget(IPC_CREAT), and succeeds, but connects to other program's shmem (4) alsa-lib doesn't know if the shmem is created by alsa-lib or not, so method 1 is not applicable
Method 2: call shmget() without IPC_CREAT firstly, if fail then call shmget(IPC_CREAT|IPC_EXCL). Possible situation 2.1: (1) alsa-lib calls shmget() without IPC_CREAT, and fails (2) other program creates a shmem with same key (3) alsa-lib calls shmget(IPC_CREAT|IPC_EXCL), and fails
Method 3: call shmget(IPC_CREAT|IPC_EXCL) firstly, if fail then call shmget() without IPC_CREAT. Possible situation 3.1: (1) alsa-lib calls shmget(IPC_CREAT|IPC_EXCL), and succeeds (2) other program tries to create a shmem with same key, and maybe fail (3) alsa-lib is the first user, so alsa-lib does clear and initialization Possible situation 3.2: (1) other program creates a shmem with same key, and succeeds (2) alsa-lib calls shmget(IPC_CREAT|IPC_EXCL), and fails (3) alsa-lib calls shmget() without IPC_CREAT, and succeeds, but connects to other program's shmem (4) alsa-lib is not the first user, so alsa-lib checks dmix->shmptr->magic!=SND_PCM_DIRECT_MAGIC (true) (5) alsa-lib return -EINVAL
So, which method should be used? Method 2 or Method 3 or something else?
thanks,
Qing Cai
participants (3)
-
bsiice
-
IceBsi
-
Takashi Iwai