[alsa-devel] Need Help: Repeated Jackhammer noise after several hours of play and heavy cpu usage in snd_pcm_dmix_sync_area
Hi,
We are facing a issue where if gdb is attached to the our long running sound process and continued, jackhammer sound is played over and over again abusing the speakers.
Process details: 1. We are using RW_INTERLEAVED access as described in write_loop method in /test/pcm.c and opening 'default' device. 2. However to disable the underruns, the snd_pcm_sw_params_set_stop_threshold is set to 0x7fffffff which is greater than boundary value. 3.Period size is 1024 and buffer size is 2048. We try to write period_size no of frames using snd_pcm_writei. (I am attaching a sample program which shows this behaviour.)
Steps to Reproduce: 1. Left process running for over 12 hours. 2. gdb is attached to the process and continued inside gdb. 3. At this point, the thread went into a tight loop taking >100% cpu and repeatedly playing same sound. It did not recover for a long time.
Stack trace shows that 'snd_pcm_dmix_sync_area' in pcm_dmix.c is taking a very long time to return because the 'size' is a large unsinged interger and mix_areas is getting called with 'transfer' value of <=2048.
If snd_pcm_sw_params_set_stop_threshold is not set or when application is running for shorter duration, attaching gdb does not show this behaviour.
Is it that setting of stop threshold to 0x7fffffff is triggering this and attaching gdb is making the buffer pointers misbehave? If so, what can be done if we do not want PCM to stop? I have tried by setting silence threshold and silence size to let alsa play silence by itself but still observed the issue.
Below is the stack trace and PCM dump:
(gdb) bt #0 0xb7c44463 in mix_areas_32_smp (size=1360, dst=0xb6855000, src=0x84a7510, sum=0xb6845000, dst_step=4, src_step=4, sum_step=4) at pcm_dmix_i386.h:261 #1 0xb7c44d2b in mix_areas (dmix=0x84a7110, src_areas=0x84a51d0, dst_areas=0x84a6f00, src_ofs=0, dst_ofs=7168, size=1024) at pcm_dmix.c:193 #2 0xb7c45422 in snd_pcm_dmix_sync_area (pcm=0x84a71e0) at pcm_dmix.c:371 #3 0xb7c46680 in snd_pcm_dmix_mmap_commit (pcm=0x84a71e0, offset=0, size=1024) at pcm_dmix.c:813 #4 0xb7c07033 in snd_pcm_mmap_commit (pcm=0x84a71e0, offset=0, frames=1024) at pcm.c:6501 #5 0xb7c1804e in snd_pcm_plugin_mmap_commit (pcm=0x84a6d18, offset=0, size=1024) at pcm_plugin.c:434 #6 0xb7c07033 in snd_pcm_mmap_commit (pcm=0x84a6d18, offset=0, frames=1024) at pcm.c:6501 #7 0xb7c17956 in snd_pcm_plugin_write_areas (pcm=0x84a5390, areas=0xb68362b0, offset=0, size=1024) at pcm_plugin.c:290 #8 0xb7c074bb in snd1_pcm_write_areas (pcm=0x84a5390, areas=0xb68362b0, offset=0, size=1024, func=0xb7c177fc <snd_pcm_plugin_write_areas>) at pcm.c:6667 #9 0xb7c17cd5 in snd_pcm_plugin_writei (pcm=0x84a5390, buffer=0x84ab518, size=1024) at pcm_plugin.c:361 #10 0xb7bff1a7 in _snd_pcm_writei (pcm=0x84a6b18, buffer=0x84ab518, size=1024) at pcm_local.h:516 #11 0xb7bff16c in snd_pcm_writei (pcm=0x84a6b18, buffer=0x84ab518, size=1024) at pcm.c:1258
(gdb) f 2 #2 0xb7c45422 in snd_pcm_dmix_sync_area (pcm=0x84a71e0) at pcm_dmix.c:371 371 pcm_dmix.c: No such file or directory. (gdb) info locals dmix = 0x84a7110 slave_hw_ptr = 234156032 slave_appl_ptr = 7168 slave_size = 1057171456 appl_ptr = 0 size = 1018295296 transfer = 1024 src_areas = 0x84a51d0 dst_areas = 0x84a6f00 (gdb) p *dmix $1 = {type = SND_PCM_TYPE_DMIX, ipc_key = 5678293, ipc_perm = 432, ipc_gid = 17, semid = 32768, shmid = 65536, shmptr = 0xb6859000, spcm = 0x84a7340, appl_ptr = 217121792, last_appl_ptr = 233882624, hw_ptr = 233875428, avail_max = 0, slave_appl_ptr = 234156032, slave_hw_ptr = 234147996, slave_period_size = 1024, slave_buffer_size = 8192, slave_boundary = 1073741824, sync_ptr = 0xb7c4557c <snd_pcm_dmix_sync_ptr>, state = SND_PCM_STATE_RUNNING, trigger_tstamp = {tv_sec = 1392384066, tv_nsec = 8308000}, server = 0, client = 0, comm_fd = 0, hw_fd = 4, timer_fd = { fd = 3, events = 41, revents = 0}, poll_fd = 3, tread = 1, timer_need_poll = 0, timer_event_suspend = 131072, timer_event_resume = 262144, server_fd = 0, server_pid = 0, timer = 0x8496b90, interleaved = 1, slowptr = 1, max_periods = 0, channels = 2, bindings = 0x0, u = {dmix = { shmid_sum = 98305, sum_buffer = 0xb6837000, mix_areas_16 = 0xb7c443a8 <mix_areas_16_smp_mmx>, mix_areas_32 = 0xb7c44415 <mix_areas_32_smp>, mix_areas_24 = 0xb7c4451b <mix_areas_24_smp_cmov>, mix_areas_u8 = 0, remix_areas_16 = 0xb7c44638 <remix_areas_16_smp_mmx>, remix_areas_32 = 0xb7c446a5 <remix_areas_32_smp>, remix_areas_24 = 0xb7c447ab <remix_areas_24_smp_cmov>, remix_areas_u8 = 0}, dsnoop = {<No data fields>}, dshare = {chn_mask = 13151478482154389505}}, server_free = 0} (gdb) p *pcm $2 = {dl_handle = 0x0, name = 0x84a6e48 "dmix:0", type = SND_PCM_TYPE_DMIX, stream = SND_PCM_STREAM_PLAYBACK, mode = 0, poll_fd_count = 1, poll_fd = 3, poll_events = 1, setup = -1, monotonic = 0, access = SND_PCM_ACCESS_MMAP_INTERLEAVED, format = SND_PCM_FORMAT_S32_LE, subformat = SND_PCM_SUBFORMAT_STD, channels = 2, rate = 44100, period_size = 1024, period_time = 23219, periods = {min = 0, max = 0, openmin = 0, openmax = 0, integer = 0, empty = 0}, tstamp_mode = SND_PCM_TSTAMP_NONE, period_step = 1, avail_min = 1024, period_event = 0, start_threshold = 1024, stop_threshold = 2147483647, silence_threshold = 0, silence_size = 0, boundary = 1073741824, info = 2151743747, msbits = 32, rate_num = 44100, rate_den = 1, hw_flags = 0, fifo_size = 0, buffer_size = 2048, buffer_time = {min = 0, max = 0, openmin = 0, openmax = 0, integer = 0, empty = 0}, sample_bits = 32, frame_bits = 64, appl = {master = 0x0, ptr = 0x84a7130, fd = -1, offset = 0, link_dst_count = 0, link_dst = 0x0, private_data = 0x0, changed = 0}, hw = { master = 0x0, ptr = 0x84a7138, fd = -1, offset = 0, link_dst_count = 0, link_dst = 0x0, private_data = 0x0, changed = 0}, min_align = 1, mmap_rw = 1, mmap_shadow = 0, donot_close = 0, mmap_channels = 0x84a54a0, running_areas = 0x84a51d0, stopped_areas = 0x0, ops = 0xb7c91860, fast_ops = 0xb7c918a0, op_arg = 0x84a71e0, fast_op_arg = 0x84a71e0, private_data = 0x84a7110, async_handlers = {next = 0x84a72e4, prev = 0x84a72e4}}
PCM dump: Plug PCM: Linear conversion PCM (S32_LE) Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 44100 exact rate : 44100 (44100/1) msbits : 16 buffer_size : 2048 period_size : 1024 period_time : 23219 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 2147483647 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Soft volume PCM Control: PCM Playback Volume min_dB: -51 max_dB: 0 resolution: 256 Its setup is: stream : PLAYBACK access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 44100 exact rate : 44100 (44100/1) msbits : 32 buffer_size : 2048 period_size : 1024 period_time : 23219 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 2147483647 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Direct Stream Mixing PCM Its setup is: stream : PLAYBACK access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 44100 exact rate : 44100 (44100/1) msbits : 32 buffer_size : 2048 period_size : 1024 period_time : 23219 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 2147483647 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Hardware PCM card 0 'HDA Intel PCH' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 44100 exact rate : 44100 (44100/1) msbits : 32 buffer_size : 8192 period_size : 1024 period_time : 23219 tstamp_mode : ENABLE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1 stop_threshold : 1073741824 silence_threshold: 0 silence_size : 1073741824 boundary : 1073741824 appl_ptr : 0 hw_ptr : 0
Thanks
Hi Adarsh,
On 02/19/2014 04:07 AM, Adarsh wrote:
Hi,
We are facing a issue where if gdb is attached to the our long running sound process and continued, jackhammer sound is played over and over again abusing the speakers.
Process details:
- We are using RW_INTERLEAVED access as described in write_loop method in
/test/pcm.c and opening 'default' device. 2. However to disable the underruns, the snd_pcm_sw_params_set_stop_threshold is set to 0x7fffffff which is greater than boundary value.
This doesn't prevent underruns. Underruns happen when the software doesn't keep up with the hardware.
What this disables is your ability to detect and recover from an underrun. So, what you're experiencing is expected behavior. If you never reach the stop threshold but userspace fails to write new audio to the buffer. Since you instructed the driver/hardware to *not* stop in the event of an underrun, it repeatedly cycles on the same, stale buffer. This gives you the "jackhammer" sound.
3.Period size is 1024 and buffer size is 2048. We try to write period_size no of frames using snd_pcm_writei. (I am attaching a sample program which shows this behaviour.)
Steps to Reproduce:
- Left process running for over 12 hours.
- gdb is attached to the process and continued inside gdb.
When gdb attached to the process, you certainly stopped writing new audio to the buffer because your process got a SIGSTOP... causing all threads to freeze.
- At this point, the thread went into a tight loop taking >100% cpu and
repeatedly playing same sound. It did not recover for a long time.
...I'm not sure what you're seeing here. Possibly dmix is trying to make the best of a bad situation.
Is it that setting of stop threshold to 0x7fffffff is triggering this and attaching gdb is making the buffer pointers misbehave? If so, what can be done if we do not want PCM to stop? I have tried by setting silence threshold and silence size to let alsa play silence by itself but still observed the issue.
It depends on what you're trying to solve by attaching with gdb.
GDB is good for a lot of things, but solving problems in real-time systems (like audio) has never been one of them. So, chances are that gdb is the wrong tool for what you're trying to do.
-gabe
Adarsh wrote:
- However to disable the underruns, the
snd_pcm_sw_params_set_stop_threshold is set to 0x7fffffff
This does not disable underruns; it just disables stopping the device when an underrun happens.
which is greater than boundary value.
Guess what happens on a 64-bit machine.
3.Period size is 1024 and buffer size is 2048.
Do you actually need that low a latency?
- gdb is attached to the process and continued inside gdb.
This will result in an underrun.
When an underrun happens, the buffer is reported to contain less than zero frames (i.e., the "avail" value is larger than the buffer size).
The program has to write samples _faster_ than normally to catch up. Alternately, it could advance the pointer by calling snd_pcm_forward (which might be a better idea because those samples won't be played at the correct time anyway.)
You should reconsider setting the stop threshold.
- At this point, the thread went into a tight loop taking >100% cpu
This is probably because of all the 'catch-up' samples that are being mixed.
Stack trace shows that 'snd_pcm_dmix_sync_area' in pcm_dmix.c is taking a very long time to return because the 'size' is a large unsinged interger and mix_areas is getting called with 'transfer' value of <=2048.
Hmmm, this might be a bug in the dmix plugin.
Regards, Clemens
Hi Gabe and Clemens, Thanks for your replies. There is still some confusion about why alsa is behaving in this manner.
GDB is good for a lot of things, but solving problems in real-time systems (like audio) has never been one of them. So, chances are that gdb is the wrong tool for what you're trying to do.
A brief background: Yes, you are right, gdb is not the proper tool. We are trying to solve random jackhammer issue happening on some production systems. One way to reproduce this is thread starvation for cpu which we are inducing by either SIGSTOP/SIGCONT or gdb or lowering the priority of sound thread.
This is probably because of all the 'catch-up' samples that are being mixed.
What we have observed is that when the process is freshly started and has not run for significant amount of time, sending SIGSTOP/SIGCONT to sound process properly stops and resumes sound.
However when run for over 12 hours, any starvation or stopping of sound thread sends dmix into a tight loop inside 'snd_pcm_dmix_sync_area' function of pcm_dmix.c. Dmix calculates the 'size' that needs to be transferred to be a huge value.
Here is the piece of code from snd_pcm_dmix_sync_area that starts the size calculations:
/* calculate the size to transfer */ /* check the available size in the local buffer * last_appl_ptr keeps the last updated position */ size = dmix->appl_ptr - dmix->last_appl_ptr; if (! size) return; if (size >= pcm->boundary / 2) size = pcm->boundary - size; . . . When the issue happens value of dmix->last_appl_ptr is greater than dmix->appl_ptr and subsequent calcualtion of size turns out to be very large number probably because of wrap around behavior of unsigned arithmetic. A typical case is: dmix->appl_ptr = 217121792, dmix->last_appl_ptr = 233882624 size = 1018295296 This causes 'mix_areas' to be called repeatedly until this size is exhausted.
When process is running for shorter duration, dmix->appl_ptr is always bigger than dmix->last_appl_ptr. For eg. dmix->appl_ptr = 26124288, dmix->last_appl_ptr = 26123264
The software boundary is 1 GB.
Hmmm, this might be a bug in the dmix plugin.
So, is it possible that the buffer wrap around could cause the pointer/size miscalcualtions inside dmix? Is it expected to have dmix->last_appl_ptr greater than dmix->appl_ptr? If so, why the issue seen only when the thread is stopped.
Please suggest. Thanks. Adarsh
On Thu, Feb 20, 2014 at 3:27 PM, Clemens Ladisch clemens@ladisch.de wrote:
Adarsh wrote:
- However to disable the underruns, the
snd_pcm_sw_params_set_stop_threshold is set to 0x7fffffff
This does not disable underruns; it just disables stopping the device when an underrun happens.
which is greater than boundary value.
Guess what happens on a 64-bit machine.
3.Period size is 1024 and buffer size is 2048.
Do you actually need that low a latency?
- gdb is attached to the process and continued inside gdb.
This will result in an underrun.
When an underrun happens, the buffer is reported to contain less than zero frames (i.e., the "avail" value is larger than the buffer size).
The program has to write samples _faster_ than normally to catch up. Alternately, it could advance the pointer by calling snd_pcm_forward (which might be a better idea because those samples won't be played at the correct time anyway.)
You should reconsider setting the stop threshold.
- At this point, the thread went into a tight loop taking >100% cpu
This is probably because of all the 'catch-up' samples that are being mixed.
Stack trace shows that 'snd_pcm_dmix_sync_area' in pcm_dmix.c is taking a very long time to return because the 'size' is a large unsinged interger and mix_areas is getting called with 'transfer' value of <=2048.
Hmmm, this might be a bug in the dmix plugin.
Regards, Clemens
participants (3)
-
Adarsh
-
Clemens Ladisch
-
Gabriel M. Beddingfield