Re: [alsa-devel] PROBLEM: SIS7019 stops recording after 42 min
Not trimming for the benefit of alsa-devel, cc'd. Please trim replies.
On Sat, 2010-06-19 at 16:13 +0200, Hans Schou wrote:
Hi
I have a problem with recording sound when using the sound chip SIS7019 with both kernel 2.6.26 and 2.6.34. After recording about 42 minutes it kind a stops recording, more precisely it is taking a pause of exactly 10 seconds between each reading.
As recorder I have tried several programs and all of them fails after 42 minutes. Some programs uses Alsa and some uses the old deprecated method. In this example I have logged sox rec.
Recording method in a script: strace -tt -o strace.log rec -c 1 -r 44100 -2 sox.wav & sleep 3000 kill $?
I think the answer is no, but to be sure -- are you talking directly to the hardware device, or are you going through pulseaudio?
While rec is running, can you capture the configuration using head -1000 /proc/asound/card0/pcm0c/sub0/* Once you have the information, there is no need to run it all the way out.
I think that's the right path, but I'm going from memory so you may have to look around under /proc/asound. I'm trying to get my MicroClient up and running again so I can try to reproduce this. This information will help me narrow my focus to the correct area of the driver.
Can you try using arecord? You can use options to tell it how to configure the PCM. Especially interesting will be to configure 2 periods per buffer vs whatever rec uses. 2 periods per buffer uses the hardware interrupts to clock out periods, where as 3+ uses a more complex mechanism. You can also use -M to use mmap vs not.
Recording start 17:41:29 and undtil 18:23:38.977278 it writes about 6-7 times per second to disk. Then it begin only writing every 10 seconds. At this point it had written 27222 times to disk and the amount of data written so far is 28684288 bytes (including wav-header).
WorkARound: When the recording stops, one can just exit 'rec' and start it again and record another 42 minutes. Reboot is not necessary. Apparently, the re-init of the sound chip fixes the problem.
On other systems with nVidia CK804 or emu10k I do not have the same problem.
I have looked at sound/pci/sis7019.c but I do not really know what I should look for. None of the numbers in my observations is in the power of 2 which I would expect it was. Like, I expected it stopped recording after 64MB or so, but that is not the case.
Linux version 2.6.34-norhtec-sis (root@norh) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #7 Wed Jun 16 00:32:56 CEST 2010 (the kernel used was compiled on this system.)
# scripts/ver_linux Gnu C 4.1.2 Gnu make 3.81 binutils 2.17 util-linux 2.12r mount 2.12r module-init-tools 3.3-pre2 e2fsprogs 1.40-WIP Linux C Library scripts/ver_linux: line 63: /proc/self/maps: No such file or directory Dynamic linker (ldd) 2.3.6 Net-tools 1.60 Console-tools 0.2.3 Sh-utils 5.97 udev 105
# lspci -vvv 00:01.4 Multimedia audio controller: Silicon Integrated Systems [SiS] SiS7019 Audio Accelerator Subsystem: Silicon Integrated Systems [SiS] SiS7019 Audio Accelerator Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (500ns min, 6000ns max) Interrupt: pin B routed to IRQ 10 Region 0: I/O ports at dc00 [size=256] Region 1: Memory at dfff8000 (32-bit, non-prefetchable)
[size=16K] Capabilities: [60] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: SiS7019 Kernel modules: snd-sis7019
Here is a snip from the log file when it goes from reading 7 times per second to every 10th second. Note two times "ioctl" within a second and then it goes over to a delay with 10 seconds between each "ioctl":
18:23:38.611877 ioctl(4, 0x800c4151, 0xbf96b1d0) = 0 18:23:38.784635 gettimeofday({1276446218, 785207}, NULL) = 0 18:23:38.786609 write(2, "\rTime: 42:08.19 [00:00.00] of 00:"..., 78)= 78 18:23:38.789354 write(3,"p\377q\377n\377k\377l\377p\377o\377o\377k\377n\377p\377q\377n\377o\377k\377l\377p"...,4096) = 4096 18:23:38.791590 write(3,"j\377l\377l\377j\377l\377k\377m\377k\377l\377n\377j\377l\377l\377n\377n\377g\377m"...,12288) = 12288 18:23:38.794425 gettimeofday({1276446218, 795024}, NULL) = 0 18:23:38.795762 ioctl(4, 0x800c4151, 0xbf96b1d0) = 0 18:23:38.970451 gettimeofday({1276446218, 971018}, NULL) = 0 18:23:38.972293 write(2, "\rTime: 42:08.37 [00:00.00] of 00:"..., 78)= 78 18:23:38.975134 write(3,"k\377m\377o\377k\377i\377j\377m\377o\377o\377m\377k\377m\377o\377k\377p\377n\377o"...,4096) = 4096 18:23:38.977278 write(3,"m\377j\377p\377n\377n\377r\377m\377o\377o\377o\377l\377h\377l\377l\377m\377o\377o"...,12288) = 12288 18:23:38.980176 gettimeofday({1276446218, 980773}, NULL) = 0 18:23:38.981516 ioctl(4, 0x800c4151, 0xbf96b1d0) = 0 18:23:48.983347 ioctl(4, 0x800c4151, 0xbf96b1d0) = 0 18:23:48.985710 gettimeofday({1276446228, 986317}, NULL) = 0 18:23:48.987784 write(2, "\rTime: 42:08.56 [00:00.00] of 00:"..., 78) = 78 18:23:48.990733 write(3,"j\377k\377l\377l\377l\377n\377l\377q\377q\377j\377l\377m\377m\377p\377o\377o\377q"...,4096) = 4096 18:23:48.992814 write(3,"p\377m\377k\377s\377r\377m\377o\377m\377k\377k\377j\377j\377l\377p\377o\377r\377l"...,12288) = 12288 18:23:48.995694 gettimeofday({1276446228, 996297}, NULL) = 0 18:23:48.997046 ioctl(4, 0x800c4151, 0xbf96b1d0) = 0 18:23:48.999574 gettimeofday({1276446229, 186}, NULL) = 0 18:23:49.003421 write(3,"l\377l\377m\377o\377q\377q\377n\377m\377q\377r\377s\377p\377n\377t\377r\377m\377n"...,4096) = 4096 18:23:49.005517 write(3,"o\377m\377l\377p\377m\377p\377o\377m\377p\377p\377q\377o\377j\377m\377r\377m\377r"...,12288) = 12288 18:23:49.008337 gettimeofday({1276446229, 8945}, NULL) = 0 18:23:49.009699 ioctl(4, 0x800c4151, 0xbf96b1d0) = 0 18:23:59.007878 ioctl(4, 0x800c4151, 0xbf96b1d0) = 0
Version of sox: SoX v14.0.1
Full log file is 780KB compressed: http://schou.dk/linux/sis7019/strace.sis7019.sox-rec.2010-06-13.log.bz2
Hardware system used is a Norhtec MicroClient Jr with SIS550 CPU: http://www.norhtec.com/products/mcjrmx/index.html
best regards/hans
2010/6/19 David Dillow dave@thedillows.org:
Not trimming for the benefit of alsa-devel, cc'd. Please trim replies.
ok
On Sat, 2010-06-19 at 16:13 +0200, Hans Schou wrote:
Hi
I have a problem with recording sound when using the sound chip SIS7019 with both kernel 2.6.26 and 2.6.34. After recording about 42 minutes it kind a stops recording, more precisely it is taking a pause of exactly 10 seconds between each reading.
As recorder I have tried several programs and all of them fails after 42 minutes. Some programs uses Alsa and some uses the old deprecated method. In this example I have logged sox rec.
Recording method in a script: strace -tt -o strace.log rec -c 1 -r 44100 -2 sox.wav & sleep 3000 kill $?
I think the answer is no, but to be sure -- are you talking directly to the hardware device, or are you going through pulseaudio?
Eh? No to what? Alsa? I am not really sure. In strace I can see 'rec' uses ioclt which could implies that it is talking directly to hardware.
While rec is running, can you capture the configuration using head -1000 /proc/asound/card0/pcm0c/sub0/*
Below was captured while running: arecord -c 1 -r 44100 -f S16 arec01.wav
==> /proc/asound/card0/pcm0c/sub0/hw_params <== access: RW_INTERLEAVED format: S16_LE subformat: STD channels: 1 rate: 44100 (44100/1) period_size: 5513 buffer_size: 22050
==> /proc/asound/card0/pcm0c/sub0/info <== card: 0 device: 0 subdevice: 0 stream: CAPTURE id: SiS7019 name: SiS7019 subname: subdevice #0 class: 0 subclass: 0 subdevices_count: 1 subdevices_avail: 0
==> /proc/asound/card0/pcm0c/sub0/prealloc <== 64
==> /proc/asound/card0/pcm0c/sub0/prealloc_max <== 128
==> /proc/asound/card0/pcm0c/sub0/status <== state: RUNNING owner_pid : 3112 trigger_time: 1277013037.939382815 tstamp : 1277013038.809174469 delay : 10640 avail : 10640 avail_max : 10920 ----- hw_ptr : 38368 appl_ptr : 27728
==> /proc/asound/card0/pcm0c/sub0/sw_params <== tstamp_mode: NONE period_step: 1 avail_min: 5513 start_threshold: 1 stop_threshold: 22050 silence_threshold: 0 silence_size: 0 boundary: 1944986400
I got a strange error message from arecord while recording at rate 44100: overrun!!! (at least 0.188 ms long) overrun!!! (at least 0.190 ms long) overrun!!! (at least 0.191 ms long)
Could this be a clue?
The error does occur with rate 8000 8bit (the default).
Can you try using arecord? You can use options to tell it how to configure the PCM. Especially interesting will be to configure 2 periods per buffer vs whatever rec uses. 2 periods per buffer uses the hardware interrupts to clock out periods, where as 3+ uses a more complex mechanism. You can also use -M to use mmap vs not.
Options like this? strace -tt -o arec.log arecord -c 1 -r 44100 -f S16 -M arec.wav
best regards/hans
Please keep me on the cc list, I'll see your message sooner in my Inbox than my mailing list folders.
On Sun, 2010-06-20 at 08:20 +0200, Hans Schou wrote:
On Sat, 2010-06-19 at 16:13 +0200, Hans Schou wrote:
Hi
I have a problem with recording sound when using the sound chip SIS7019 with both kernel 2.6.26 and 2.6.34. After recording about 42 minutes it kind a stops recording, more precisely it is taking a pause of exactly 10 seconds between each reading.
As recorder I have tried several programs and all of them fails after 42 minutes. Some programs uses Alsa and some uses the old deprecated method. In this example I have logged sox rec.
Recording method in a script: strace -tt -o strace.log rec -c 1 -r 44100 -2 sox.wav & sleep 3000 kill $?
I think the answer is no, but to be sure -- are you talking directly to the hardware device, or are you going through pulseaudio?
Eh? No to what? Alsa?
Pulseaudio.
I am not really sure. In strace I can see 'rec' uses ioclt which could implies that it is talking directly to hardware.
Probably, but I'm not familiar enough with the library/user-space side of alsa to know how it talks to plugins. I expect you are correct, though.
While rec is running, can you capture the configuration using head -1000 /proc/asound/card0/pcm0c/sub0/*
Below was captured while running: arecord -c 1 -r 44100 -f S16 arec01.wav
You caught the correct files, yes. Did that command produce 10 second pauses? If not, then I need to see the same information from the rec command you were using to reproduce the issue before. It may be easier to just run the rec command for a moment to collect the data rather than wait the 40+ minutes to see if arecord also demonstrates the issue.
I got a strange error message from arecord while recording at rate 44100: overrun!!! (at least 0.188 ms long) overrun!!! (at least 0.190 ms long) overrun!!! (at least 0.191 ms long)
Could this be a clue?
Perhaps; I'm guessing that rec (sox) is using the mmap interface, and I've not done much with that -- though perhaps there is a bug in sox's overrun handling. You could enable SND_PCM_XRUN_DEBUG to see if overruns are happening when sox starts taking 10 seconds.
Getting overruns on SiS 550 based devices isn't entirely surprising, as it doesn't have a huge amount of horsepower or memory. If you have too much going on in the background, it is very easy to not have enough time to processes all of the captured data, especially if you are running short period/buffer sizes.
The error does occur with rate 8000 8bit (the default).
Do you mean 'does not occur'? That may be more expected -- 8khz/8bit has approximately 9% of the data as 44.1khz/16bit.
Can you try using arecord? You can use options to tell it how to configure the PCM. Especially interesting will be to configure 2 periods per buffer vs whatever rec uses. 2 periods per buffer uses the hardware interrupts to clock out periods, where as 3+ uses a more complex mechanism. You can also use -M to use mmap vs not.
Options like this? strace -tt -o arec.log arecord -c 1 -r 44100 -f S16 -M arec.wav
Yes, for testing if the mmap interface is the problem. You can also use -v to have it print more information, including the source of the data (hw vs Pulseaudio). To check 1 period vs 2, you can use the --period-size/--buffer-size options. I seem to recall that there is a bug lurking around there somewhere, so you should check that you got 1 period using -v or the proc files. You may have to fallback to the -F/-B options and convert samples to microseconds.
Dave
2010/6/20 David Dillow dave@thedillows.org
Please keep me on the cc list, I'll see your message sooner in my Inbox than my mailing list folders.
On Sun, 2010-06-20 at 08:20 +0200, Hans Schou wrote:
On Sat, 2010-06-19 at 16:13 +0200, Hans Schou wrote:
Hi
I have a problem with recording sound when using the sound chip SIS7019 with both kernel 2.6.26 and 2.6.34. After recording about 42 minutes it kind a stops recording, more precisely it is taking a pause of exactly 10 seconds between each reading.
As recorder I have tried several programs and all of them fails after 42 minutes. Some programs uses Alsa and some uses the old deprecated method. In this example I have logged sox rec.
Recording method in a script: strace -tt -o strace.log rec -c 1 -r 44100 -2 sox.wav & sleep 3000 kill $?
I think the answer is no, but to be sure -- are you talking directly to the hardware device, or are you going through pulseaudio?
Eh? No to what? Alsa?
Pulseaudio.
what is the default device ? dsnoop, hw or pulse
Please use alsa "hw" device first , and then pulseaudio
if you are using pulseaudio , you need to provide a full pulseaudio.log ( pulseaudio -vvvvv ) and cc PA developer with your result
I am not really sure. In strace I can see 'rec' uses ioclt which could implies that it is talking directly to hardware.
Probably, but I'm not familiar enough with the library/user-space side of alsa to know how it talks to plugins. I expect you are correct, though.
While rec is running, can you capture the configuration using head -1000 /proc/asound/card0/pcm0c/sub0/*
Below was captured while running: arecord -c 1 -r 44100 -f S16 arec01.wav
Please specify -D hw:0,0 and -v to get output of snd_pcm_dump()
You caught the correct files, yes. Did that command produce 10 second pauses? If not, then I need to see the same information from the rec command you were using to reproduce the issue before. It may be easier to just run the rec command for a moment to collect the data rather than wait the 40+ minutes to see if arecord also demonstrates the issue.
I got a strange error message from arecord while recording at rate 44100: overrun!!! (at least 0.188 ms long) overrun!!! (at least 0.190 ms long) overrun!!! (at least 0.191 ms long)
Could this be a clue?
Are you getting this result when using "hw" device or "pulse" device ?
what buffer size and period size are arecord using ?
Perhaps; I'm guessing that rec (sox) is using the mmap interface, and I've not done much with that -- though perhaps there is a bug in sox's overrun handling. You could enable SND_PCM_XRUN_DEBUG to see if overruns are happening when sox starts taking 10 seconds.
Getting overruns on SiS 550 based devices isn't entirely surprising, as it doesn't have a huge amount of horsepower or memory. If you have too much going on in the background, it is very easy to not have enough time to processes all of the captured data, especially if you are running short period/buffer sizes.
The error does occur with rate 8000 8bit (the default).
Do you mean 'does not occur'? That may be more expected -- 8khz/8bit has approximately 9% of the data as 44.1khz/16bit.
Can you try using arecord? You can use options to tell it how to configure the PCM. Especially interesting will be to configure 2
periods
per buffer vs whatever rec uses. 2 periods per buffer uses the hardware interrupts to clock out periods, where as 3+ uses a more complex mechanism. You can also use -M to use mmap vs not.
Options like this? strace -tt -o arec.log arecord -c 1 -r 44100 -f S16 -M arec.wav
AFAIK , pulse device did not support MMAP
Yes, for testing if the mmap interface is the problem. You can also use -v to have it print more information, including the source of the data (hw vs Pulseaudio). To check 1 period vs 2, you can use the --period-size/--buffer-size options. I seem to recall that there is a bug lurking around there somewhere, so you should check that you got 1 period using -v or the proc files. You may have to fallback to the -F/-B options and convert samples to microseconds.
Dave
2010/6/20 David Dillow dave@thedillows.org:
You caught the correct files, yes. Did that command produce 10 second pauses? If not, then I need to see the same information from the rec command you were using to reproduce the issue before. It may be easier to just run the rec command for a moment to collect the data rather than wait the 40+ minutes to see if arecord also demonstrates the issue.
Over night I have been running arecord several times. Alle wav-files are much below the expected size 264600000 bytes (44100*2*60s*50min). The file size recorded
184928116 184939142 185170688 186030716 186989978 187166394 187221524 188555670 188654904 188798242 189503906 189900842
189900842/(44100*2*60) = 35.88
Only 35 minutes of recording but I was running in 50min.
It seems that arecord loses more sample than I have seen with sox-rec.
I'm guessing that rec (sox) is using the mmap interface, and I've not done much with that -- though perhaps there is a bug in sox's overrun handling. You could enable SND_PCM_XRUN_DEBUG to see if overruns are happening when sox starts taking 10 seconds.
How do I enable SND_PCM_XRUN_DEBUG with sox?
Getting overruns on SiS 550 based devices isn't entirely surprising, as it doesn't have a huge amount of horsepower or memory.
Well, I usually don't have problem with that. I have samba running but I don't access the recorded files while recording, so it is not a problem.
Right now uptime says load average: 0.19, 0.21, 0.18 but strace and top is the bad guys, not arecord.
The error does occur with rate 8000 8bit (the default).
Do you mean 'does not occur'? That may be more expected -- 8khz/8bit has approximately 9% of the data as 44.1khz/16bit.
Sorry, yes you are right.
Can you try using arecord? You can use options to tell it how to configure the PCM. Especially interesting will be to configure 2 periods per buffer vs whatever rec uses. 2 periods per buffer uses the hardware interrupts to clock out periods, where as 3+ uses a more complex mechanism. You can also use -M to use mmap vs not.
Options like this? strace -tt -o arec.log arecord -c 1 -r 44100 -f S16 -M arec.wav
I tried this one: arecord -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav but it did not change anything. Still the files are much too small.
Recording WAVE 'arec.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono Hardware PCM card 0 'SiS7019' device 0 subdevice 0 Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S16_LE subformat : STD channels : 1 rate : 44100 exact rate : 44100 (44100/1) msbits : 16 buffer_size : 22050 period_size : 5513 period_time : 125011 tick_time : 0 tstamp_mode : NONE period_step : 1 sleep_min : 0 avail_min : 5513 xfer_align : 5513 start_threshold : 1 stop_threshold : 22050 silence_threshold: 0 silence_size : 0 boundary : 1944986400 overrun!!! (at least 4.368 ms long) Status: state : XRUN trigger_time: 1277092780.321430383 tstamp : 1277092780.323362792 delay : 0 avail : 27562 avail_max : 27562
# grep avail arec-stdout2.log | sort | uniq -c 6 avail : 22053 13 avail : 22055 2 avail : 22058 1 avail : 22059 10 avail : 22060 4 avail : 22062 1 avail : 22063 2 avail : 22064 10975 avail : 27562 5 avail : 33075 2 avail : 38588 6 avail_max : 22053 13 avail_max : 22055 2 avail_max : 22058 1 avail_max : 22059 10 avail_max : 22060 4 avail_max : 22062 1 avail_max : 22063 2 avail_max : 22064 10975 avail_max : 27562 5 avail_max : 33075 2 avail_max : 38588 2 avail_min : 5513
So most often 'avail' is 27562 after an overrun when running arecord.
I think it would be better to test with sox-rec but which options should be used?
/hans
On Mon, 2010-06-21 at 07:56 +0200, Hans Schou wrote:
2010/6/20 David Dillow dave@thedillows.org:
You caught the correct files, yes. Did that command produce 10 second pauses? If not, then I need to see the same information from the rec command you were using to reproduce the issue before. It may be easier to just run the rec command for a moment to collect the data rather than wait the 40+ minutes to see if arecord also demonstrates the issue.
Over night I have been running arecord several times. Alle wav-files are much below the expected size 264600000 bytes (44100*2*60s*50min).
Only 35 minutes of recording but I was running in 50min.
Ok, cool, we see the problem with arecord as well, though you were getting overrun messages as well?
I'm guessing that rec (sox) is using the mmap interface, and I've not done much with that -- though perhaps there is a bug in sox's overrun handling. You could enable SND_PCM_XRUN_DEBUG to see if overruns are happening when sox starts taking 10 seconds.
How do I enable SND_PCM_XRUN_DEBUG with sox?
Sorry, that should be CONFIG_SND_PCM_XRUN_DEBUG in the kernel configuration, but if we can demonstrate this with arecord, there's no real reason to recompile your kernel at this point.
Getting overruns on SiS 550 based devices isn't entirely surprising, as it doesn't have a huge amount of horsepower or memory.
Well, I usually don't have problem with that. I have samba running but I don't access the recorded files while recording, so it is not a problem.
Right now uptime says load average: 0.19, 0.21, 0.18 but strace and top is the bad guys, not arecord.
An overrun means that arecord didn't run for 500ms, and the load average won't really tell you much about that -- latency can happen with low loads. That said, I'm suspecting that you've found a problem in the driver, and I'd lay odds it is in the handling of multiple periods per buffer.
Can you try using arecord? You can use options to tell it how to configure the PCM. Especially interesting will be to configure 2 periods per buffer vs whatever rec uses. 2 periods per buffer uses the hardware interrupts to clock out periods, where as 3+ uses a more complex mechanism. You can also use -M to use mmap vs not.
Options like this? strace -tt -o arec.log arecord -c 1 -r 44100 -f S16 -M arec.wav
I tried this one: arecord -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav but it did not change anything. Still the files are much too small.
Ok, read/write vs mmap is not a differentiating factor, good deal.
So most often 'avail' is 27562 after an overrun when running arecord.
I think it would be better to test with sox-rec but which options should be used?
I don't know the options available on sox, but if you can use arecord to reproduce, then that is probably the best tool for the job. Can you set it up to use two periods per buffer and see if you still can reproduce? Options would look like -B 250000 -F 125000. A second test with -B 1000000 -F 500000 would be interesting, if the hw can handle buffers of that size -- I don't recall offhand.
I will hopefully have my hardware back up and running tonight; BTW, what distro are you using? I'm trying Fedora 13, but I'm expecting to run into trouble with the lack of cmov support on the processor.
Thanks, Dave
2010/6/21 David Dillow dave@thedillows.org:
Only 35 minutes of recording but I was running in 50min.
Ok, cool, we see the problem with arecord as well, though you were getting overrun messages as well?
Yes, a lot of overrun. At least one per second.
How do I enable SND_PCM_XRUN_DEBUG with sox?
Sorry, that should be CONFIG_SND_PCM_XRUN_DEBUG in the kernel configuration, but if we can demonstrate this with arecord, there's no real reason to recompile your kernel at this point.
OK, I'll skip that. It takes rather long time to compile.
Right now uptime says load average: 0.19, 0.21, 0.18 but strace and top is the bad guys, not arecord.
An overrun means that arecord didn't run for 500ms, and the load average won't really tell you much about that -- latency can happen with low
Well, I did not see that with sox. It was running fine for 42 min.
loads. That said, I'm suspecting that you've found a problem in the driver, and I'd lay odds it is in the handling of multiple periods per buffer.
I don't know the options available on sox, but if you can use arecord to reproduce, then that is probably the best tool for the job. Can you set it up to use two periods per buffer and see if you still can reproduce? Options would look like -B 250000 -F 125000. A second test with -B 1000000 -F 500000 would be interesting, if the hw can handle buffers of that size -- I don't recall offhand.
I have just started it with -B 250000 -F 125000 and get a lot of overrun. I skipped using strace to make less stress.
cmdline is now: arecord -B 250000 -F 125000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
overrun!!! (at least 0.201 ms long) Status: state : XRUN trigger_time: 1277140527.1826705 tstamp : 1277140527.2005279 delay : 0 avail : 16537 avail_max : 16537 overrun!!! (at least 0.207 ms long) Status: state : XRUN trigger_time: 1277140527.128608572 tstamp : 1277140527.128793445 delay : 0 avail : 16537 avail_max : 16537 overrun!!! (at least 0.167 ms long) Status: state : XRUN trigger_time: 1277140527.256925501 tstamp : 1277140527.257071819 delay : 0 avail : 16537 avail_max : 16537
I will hopefully have my hardware back up and running tonight; BTW, what distro are you using? I'm trying Fedora 13, but I'm expecting to run into trouble with the lack of cmov support on the processor.
Debian stable.
/hans
2010/6/21 Hans Schou linux@schou.dk:
2010/6/21 David Dillow dave@thedillows.org:
Only 35 minutes of recording but I was running in 50min.
Ok, cool, we see the problem with arecord as well, though you were getting overrun messages as well?
Yes, a lot of overrun. At least one per second.
How do I enable SND_PCM_XRUN_DEBUG with sox?
Sorry, that should be CONFIG_SND_PCM_XRUN_DEBUG in the kernel configuration, but if we can demonstrate this with arecord, there's no real reason to recompile your kernel at this point.
OK, I'll skip that. It takes rather long time to compile.
Right now uptime says load average: 0.19, 0.21, 0.18 but strace and top is the bad guys, not arecord.
An overrun means that arecord didn't run for 500ms, and the load average won't really tell you much about that -- latency can happen with low
Well, I did not see that with sox. It was running fine for 42 min.
loads. That said, I'm suspecting that you've found a problem in the driver, and I'd lay odds it is in the handling of multiple periods per buffer.
I don't know the options available on sox, but if you can use arecord to reproduce, then that is probably the best tool for the job. Can you set it up to use two periods per buffer and see if you still can reproduce? Options would look like -B 250000 -F 125000. A second test with -B 1000000 -F 500000 would be interesting, if the hw can handle buffers of that size -- I don't recall offhand.
I have just started it with -B 250000 -F 125000 and get a lot of overrun. I skipped using strace to make less stress.
cmdline is now: arecord -B 250000 -F 125000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
This gave a 191069598 bytes long file. What does this test actually show regarding the original problem with stopping after 42 min?
I have just started: arecord -B 1000000 -F 500000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav and I only got one overrun. What I did was that logged in on another ssh console and executed "ls".
Here is a complete screen dump after running 2 minutes: + arecord -B 1000000 -F 500000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav Recording WAVE 'arec.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono Hardware PCM card 0 'SiS7019' device 0 subdevice 0 Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S16_LE subformat : STD channels : 1 rate : 44100 exact rate : 44100 (44100/1) msbits : 16 buffer_size : 32768 period_size : 22050 period_time : 500000 tick_time : 0 tstamp_mode : NONE period_step : 1 sleep_min : 0 avail_min : 22050 xfer_align : 22050 start_threshold : 1 stop_threshold : 32768 silence_threshold: 0 silence_size : 0 boundary : 1445068800 overrun!!! (at least 826.140 ms long) Status: state : XRUN trigger_time: 1277144902.603264297 tstamp : 1277144903.428975238 delay : 0 avail : 44108 avail_max : 44108 overrun!!! (at least 0.077 ms long) Status: state : XRUN trigger_time: 1277144938.378978657 tstamp : 1277144938.379038604 delay : 0 avail : 42344 avail_max : 42344
/hans /hans
On Mon, 2010-06-21 at 20:49 +0200, Hans Schou wrote:
2010/6/21 Hans Schou linux@schou.dk:
2010/6/21 David Dillow dave@thedillows.org:
An overrun means that arecord didn't run for 500ms, and the load average won't really tell you much about that -- latency can happen with low
Well, I did not see that with sox. It was running fine for 42 min.
My thought is that you may have been somewhat fortunate and did not see an overrun for 42 minutes. I am trying to narrow down if this is an issue with sox's overrun handling, if there is a big with handing anything other than two periods per buffer, or some other generic bug in the driver.
I don't know the options available on sox, but if you can use arecord to reproduce, then that is probably the best tool for the job. Can you set it up to use two periods per buffer and see if you still can reproduce? Options would look like -B 250000 -F 125000. A second test with -B 1000000 -F 500000 would be interesting, if the hw can handle buffers of that size -- I don't recall offhand.
I have just started it with -B 250000 -F 125000 and get a lot of overrun. I skipped using strace to make less stress.
cmdline is now: arecord -B 250000 -F 125000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
This gave a 191069598 bytes long file. What does this test actually show regarding the original problem with stopping after 42 min?
I'm trying to have you reproduce the original problem using 2 periods per buffer so that I can localize the likely location of a driver bug. If it only happens when there is not 2 periods per buffer, then that points to one set of timing code. If it also happens with 2 periods per buffer, then that points to a more generic bug.
I have just started: arecord -B 1000000 -F 500000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav and I only got one overrun. What I did was that logged in on another ssh console and executed "ls".
That sounds like a scheduling latency issue, yes.
Here is a complete screen dump after running 2 minutes:
- arecord -B 1000000 -F 500000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
Recording WAVE 'arec.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono Hardware PCM card 0 'SiS7019' device 0 subdevice 0 Its setup is:
buffer_size : 32768 period_size : 22050
Ok, please play with the options until you can get buffer_size = 2 * period_size. That will eliminate the alternate timing code from the path. I expected that options you gave to do that, but apparently there is something else keeping that from happening.
Dave
2010/6/21 Hans Schou linux@schou.dk:
I have just started: arecord -B 1000000 -F 500000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
It gave a file size of 258822944 bytes. Almost 50 min: echo "258822944/(2*60*44100)" | bc -ql 48.90
This is the longest recording I have made on this hardware.
I have this running now: + arecord -B 200000 -F 100000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav Recording WAVE 'arec.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono Hardware PCM card 0 'SiS7019' device 0 subdevice 0 Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S16_LE subformat : STD channels : 1 rate : 44100 exact rate : 44100 (44100/1) msbits : 16 buffer_size : 8820 period_size : 4410 period_time : 100000 tick_time : 0 tstamp_mode : NONE period_step : 1 sleep_min : 0 avail_min : 4410 xfer_align : 4410 start_threshold : 1 stop_threshold : 8820 silence_threshold: 0 silence_size : 0 boundary : 1244678400
and no overrun within 2 min.
/hans
On Mon, 2010-06-21 at 22:10 +0200, Hans Schou wrote:
2010/6/21 Hans Schou linux@schou.dk:
I have just started: arecord -B 1000000 -F 500000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
It gave a file size of 258822944 bytes. Almost 50 min: echo "258822944/(2*60*44100)" | bc -ql 48.90
This is the longest recording I have made on this hardware.
I have this running now:
- arecord -B 200000 -F 100000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
Ok, it seems something is definately funky with the driver in general, and gets worse when we're using more than 2 periods per buffer.
If I use:
strace -tt -T -o record.log arecord -f S16 -r 44100 -c 1 -d 3600 \ --buffer-size=32768 --period-size=16384 -v -t raw - > /dev/null
I don't have any overruns, nor does it go into the 10-second pause mode. I do see an odd alternating pattern of one period taking 1.4ms to capture, and the next taking 732ms. The period should be about 371ms, so the lumpyness of the timing is likely making it easier to hit overruns.
Then, using:
strace -tt -T -o record.log arecord -f S16 -r 44100 -c 1 -d 3600 \ --buffer-size=32768 --period-size=8192 -v -t raw - > /dev/null
I was able to push it into the 10-second pause failure mode, without hitting any overruns. Initially it has a similar alternating pattern of 0.6ms then 363ms capture times until it drops into a 10 second pause just after 42 minutes.
Now that I finally got my hardware going and can reproduce this, I'll dig into it over the next few evenings.
Thanks for the report and help narrowing down the focus area, Dave
On Tue, 2010-06-22 at 23:00 -0400, David Dillow wrote:
On Mon, 2010-06-21 at 22:10 +0200, Hans Schou wrote:
2010/6/21 Hans Schou linux@schou.dk:
I have just started: arecord -B 1000000 -F 500000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
It gave a file size of 258822944 bytes. Almost 50 min: echo "258822944/(2*60*44100)" | bc -ql 48.90
This is the longest recording I have made on this hardware.
I have this running now:
- arecord -B 200000 -F 100000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
Ok, it seems something is definately funky with the driver in general, and gets worse when we're using more than 2 periods per buffer.
The driver's funkiness may be limited to the more than 2 periods per buffer/10 second pause issue.
I don't have any overruns, nor does it go into the 10-second pause mode. I do see an odd alternating pattern of one period taking 1.4ms to capture, and the next taking 732ms. The period should be about 371ms, so the lumpyness of the timing is likely making it easier to hit overruns.
I can reproduce the alternating pattern of periods using a different device, so this may be something in the core code of ALSA.
Hardware PCM card 0 'HDA Intel' device 0 subdevice 0 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 44100 exact rate : 44100 (44100/1) msbits : 16 buffer_size : 32768 period_size : 16384 period_time : 371519 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 32768 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0
strace -T gives: ioctl(4, 0x80184151, 0x7fff532cff70) = 0 <0.371672> write(1, ..., 65536) = 65536 <0.000035> ioctl(4, 0x80184151, 0x7fff532cff70) = 0 <0.742746> write(1, ..., 65536) = 65536 <0.000013> ioctl(4, 0x80184151, 0x7fff532cff70) = 0 <0.000027> write(1, ..., 65536) = 65536 <0.000016> ioctl(4, 0x80184151, 0x7fff532cff70) = 0 <0.742694> write(1, ..., 65536) = 65536 <0.000011> ioctl(4, 0x80184151, 0x7fff532cff70) = 0 <0.000038> write(1, ..., 65536) = 65536 <0.000024> ioctl(4, 0x80184151, 0x7fff532cff70) = 0 <0.742600>
The first period gets clocked out close to the expected time -- 371.672ms actual vs 371.519 ms expected -- but then we're either running almost a full buffer -- 742.746ms actual vs 743.038ms buffer -- or very, very short at 0.027ms.
Since this problem is not related to the SiS7019 driver, I'll leave it to later -- or for others to investigate. Using three periods per buffer sees a similar pattern, but one has an extra period time to work within.
Dave
2010/6/23 David Dillow dave@thedillows.org:
I can reproduce the alternating pattern of periods using a different device, so this may be something in the core code of ALSA.
Strange? The program I am actually using is one I have written myself. I was sure I was using the old method as I open(2). In my program I use int fd = open("/dev/dsp", O_RDWR);
This should not go through ALSA (I think). I gues one should use something with "snd_pcm_open()" to use ALSA.
Since this problem is not related to the SiS7019 driver, I'll leave it to later -- or for others to investigate. Using three periods per buffer sees a similar pattern, but one has an extra period time to work within.
I really don't understand why I do not see this problem on other hardware using the same compiled code.
If the problem is ALSA, how can I then turn it off?
Thanks for your investigation.
regards/hans
2010/6/23 Hans Schou linux@schou.dk:
If the problem is ALSA, how can I then turn it off?
I am quite sure it is turned off:
# dmesg | grep -A 1 ALSA ALSA device list: No soundcards found.
regards/hans
On Wed, 2010-06-23 at 20:22 +0200, Hans Schou wrote:
2010/6/23 David Dillow dave@thedillows.org:
I can reproduce the alternating pattern of periods using a different device, so this may be something in the core code of ALSA.
Strange? The program I am actually using is one I have written myself. I was sure I was using the old method as I open(2). In my program I use int fd = open("/dev/dsp", O_RDWR);
This should not go through ALSA (I think). I gues one should use something with "snd_pcm_open()" to use ALSA.
To use ALSA directly, yes. However, there is no OSS SIS7019 driver in the kernel, only the ALSA one I wrote. There is an OSS emulation layer, so you can use the OSS API to talk to the ALSA core. Unless you are using a vendor (SIS/MicroJr/etc.) supplied driver, you are using ALSA in some capacity.
Since this problem is not related to the SiS7019 driver, I'll leave it to later -- or for others to investigate. Using three periods per buffer sees a similar pattern, but one has an extra period time to work within.
I really don't understand why I do not see this problem on other hardware using the same compiled code.
You mean the 10 second pauses during recording after 42 minutes? That is a driver bug, so it is specific to this card.
If you mean the alternating long/sort period times, then it is present on other hardware, but you likely don't see it either due to the defaults giving you more periods to work in so you don't see overruns, faster hardware than the SIS550 CPU, or a combination of the two. The bug that gives rise to the 10 second pauses can also shorten the periods I think, making overruns more likely to occur.
Dave
2010/6/23 David Dillow dave@thedillows.org:
On Wed, 2010-06-23 at 20:22 +0200, Hans Schou wrote:
int fd = open("/dev/dsp", O_RDWR);
This should not go through ALSA (I think). I gues one should use something with "snd_pcm_open()" to use ALSA.
To use ALSA directly, yes. However, there is no OSS SIS7019 driver in the kernel, only the ALSA one I wrote. There is an OSS emulation layer, so you can use the OSS API to talk to the ALSA core. Unless you are using a vendor (SIS/MicroJr/etc.) supplied driver, you are using ALSA in some capacity.
OK. Thanks for clearing it out.
I guess I should have programmed against ALSA when I started.
I really don't understand why I do not see this problem on other hardware using the same compiled code.
You mean the 10 second pauses during recording after 42 minutes? That is a driver bug, so it is specific to this card.
OK. And 42 min is also the only problem I see. I have no problems with the recorded sounds.
If you mean the alternating long/sort period times, then it is present on other hardware, but you likely don't see it either due to the defaults giving you more periods to work in so you don't see overruns, faster hardware than the SIS550 CPU, or a combination of the two. The bug that gives rise to the 10 second pauses can also shorten the periods I think, making overruns more likely to occur.
OK.
/hans
On Mon, 2010-06-21 at 19:42 +0200, Hans Schou wrote:
2010/6/21 David Dillow dave@thedillows.org:
I don't know the options available on sox, but if you can use arecord to reproduce, then that is probably the best tool for the job. Can you set it up to use two periods per buffer and see if you still can reproduce? Options would look like -B 250000 -F 125000. A second test with -B 1000000 -F 500000 would be interesting, if the hw can handle buffers of that size -- I don't recall offhand.
I have just started it with -B 250000 -F 125000 and get a lot of overrun. I skipped using strace to make less stress.
cmdline is now: arecord -B 250000 -F 125000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
overrun!!! (at least 0.201 ms long) Status: state : XRUN trigger_time: 1277140527.1826705 tstamp : 1277140527.2005279 delay : 0 avail : 16537 avail_max : 16537
This really makes me think the system just cannot keep up. Can you post the initial output to verify that it is getting two periods per buffer, please? If you are getting overruns and small files in that configuration, there is not much I think I do to help -- that uses the hardware's built-in support for generating the appropriate interrupts.
Also, does it do the same without the -M ? And you may wish to try longer and shorter periods to see if you get so many overruns.
Also, what storage are you using? Does it support DMA? PIO modes can be very slow...
I will hopefully have my hardware back up and running tonight; BTW, what distro are you using? I'm trying Fedora 13, but I'm expecting to run into trouble with the lack of cmov support on the processor.
Debian stable.
Thanks, I'll fall back to that if getting Fedora going looks to be more trouble than it is worth.
Dave
2010/6/21 David Dillow dave@thedillows.org:
On Mon, 2010-06-21 at 19:42 +0200, Hans Schou wrote:
cmdline is now: arecord -B 250000 -F 125000 -c 1 -r 44100 -f S16 -M -D hw:0,0 -v arec.wav
This really makes me think the system just cannot keep up. Can you post the initial output to verify that it is getting two periods per buffer, please?
Then I have to run it again. I do not have it on the screen anymore. I will do that after the test I'm running now - as mentioned before.
I have turned on my signal generator with 400Hz. I will give it a try with listen to it, to hear any clicks when the overrun occur.
Also, does it do the same without the -M ? And you may wish to try longer and shorter periods to see if you get so many overruns.
Also, what storage are you using? Does it support DMA? PIO modes can be very slow...
On a daily basis I use USB flash memory stick and have done this more than a year but always less than 42 min. Right now I'm using old 4GB external hard disk connected through USB.
/hans
participants (3)
-
David Dillow
-
Hans Schou
-
Raymond Yau