[alsa-devel] report bug in kernel sound driver

27 Oct 2010


      Dear all
i want report a bug in davinci sound device driver (sound/soc/davinci-pcm.c)
This bug cause by DMA copy Overflow. It will cause kernel oops with a lot of unusual info.
And this problem seem still in latest stable kernel (version 2.6.35.7)
Bug Symptom at the end of mail
Here is my analyse of this bug:
Device will call function :  davinci_pcm_new 
                        This function will malloc a lagre Continuous Pages buffer (Typicly:128K) both Playback and Capture.
                        Those two buffer will use as DMA copy !
When someone recoder sound date ! This driver will use DMA.Copy register date to Capture buffer that malloc at function davinci_pcm_new !
every DMA copy finish.callback function davinci_pcm_dma_irq will run. function davinci_pcm_enqueue_dma will work.This function will
set DMA copy params again. And problem is in here !
It set DMA params :
src = sound recoder 32-bit reg address
                                                dst = prtd->period * period_size
                                                src_bidx = 0                              //(Every dma copy finish the src will not change)
                                                dst_bidx = data_type;      //date_type = 2, because only high 16-bit is the sound date.
                                                acnt = 4
                                                bcnt = 2048
                                                cnt = 1
Use this param.DMA Internal work like:
for(c=0;c<cnt;c++) {
                                                            for(b=0;b<bcnt;b++) {
                                                                                                memcopy(&dst,&src,4)
                                                                                                src += src_bidx;  //src_bidx = 0;
                                                                                                dst += dst_bidx   //dst_bidx = data_type =2  16bit sound date
                                                                        }                                                                       
                                      }
This copy will make all dst buffer has source high 16 bit date. but will cause 2 byptes Overflow
every time the dma copy finish. it will change 4K bytes + 2 bytes. The 2 bytes is DMA copy Overflow.
it will not error until you copy to the last period! because. your date total copy 128K +2bytes . and we only malloc 128K bytes
other 2 bytes is kernel space memory. this two bytes will be use random by kernel. And those 2 bytes copy by dma. kernel don't know
anything about this segment default.
This easy way to fix the problem is change:
                                    if(unlikely(prtd->period >= runtime->periods))
                                                prtd->period = 0;
In function davinci_pcm_enqueue_dma to: 
                                    if(unlikely(prtd->period >= (runtime->periods-1)))
                                                prtd->period = 0;
Below is the Symptom:
Symptom 1:
Bad pte = 04040202, process = sleep, vm_flags = 1875, vaddr = 1b000
VM: killing process sleep
Bad pte = 04040601, process = ???, vm_flags = 1875, vaddr = 17000
Bad pte = ffffffff, process = ???, vm_flags = 1875, vaddr = 43000
Bad pte = 00000001, process = ???, vm_flags = 1875, vaddr = 44000
…………..
Bad pte = 00000001, process = ???, vm_flags = 1875, vaddr = 88000
Symptom 2:
Unhandled fault: page domain fault (0x8fb) at 0x00011008
Internal error: : 8fb [#1]
Modules linked in: tlv320aic24 dm365_gpio dm365_pwm davinci_vpbe davinci_capture dm365_imp dm365mmap edmak irqk cmemk
CPU: 0
PC is at __copy_to_user+0x54/0x3a8
LR is at 0x5eff968
pc : [<c0117568>]    lr : [<05eff968>]    Not tainted
sp : c436befc  ip : e4640f80  fp : c436bf4c
r10: 00000000  r9 : c436a000  r8 : dcfd0362
r7 : 0ee2fab7  r6 : f7a60e69  r5 : fe9cf7d3  r4 : 026603c7
r3 : 0b7de3b1  r2 : 00000760  r1 : c5056020  r0 : 00011008
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  Segment user
Control: 5317F
Table: 843C0000  DAC: 00000015
,,,,,,,,,,,,,,,,,,,,,,,,
page:c0363be0 flags:0x00000068 mapping:c4273d18 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Symptom 3:
159.99.249.249 login: VM: killing process video_test
Bad pte = 00000003, process = ???, vm_flags = 1875, vaddr = 9000
Bad pte = 00000005, process = ???, vm_flags = 1875, vaddr = b000
,,,,,,,,,,,,,,,,,,,,,
Bad pte = 00000001, process = ???, vm_flags = 100077, vaddr = 31000
Bad page state in process 'desched/0'
page:c035e3e0 flags:0x0000006c mapping:c06ecec8 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Symptom 4:
159.99.249.249 login: Bad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000
Stopping interneBad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000
t superserver: iBad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000
netdBad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000
Bad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000
Bad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000
,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Bad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000
Symptom 5:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
done.
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 817 [#1]
Modules linked in: tlv320aic24 dm365_gpio dm365_pwm davinci_vpbe davinci_capture dm365_imp dm365mmap edmak irqk cmemk
CPU: 0
PC is at __free_pages+0x18/0x58
LR is at __init_begin+0x3fff8000/0x30
pc : [<c007626c>]    lr : [<00000000>]    Not tainted
sp : c03cdf50  ip : c03cdf60  fp : c03cdf5c
r10: c02de000  r9 : 00000002  r8 : c02ca460
r7 : 00000000  r6 : 843cffd0  r5 : c43c0000  r4 : c03cc000
r3 : 00000000  r2 : c02ca444  r1 : 00000000  r0 : c03659e0
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  Segment kernel
Control: 5317F
Table: 805BC000  DAC: 00000017
Process desched/0 (pid: 11, stack limit = 0xc03cc258)
Stack: (0xc03cdf50 to 0xc03ce000)
df40:                                     c03cdf84 c03cdf60 c003ad7c c0076264 
df60: c002b6c0 c002b6c0 00000000 c02c2990 00000001 c02c2998 c03cdf9c c03cdf88 
df80: c0045d54 c003ac64 c03b3f18 c03cc000 c03cdfcc c03cdfa0 c0047b2c c0045d38 
dfa0: 00000000 00000000 c03cc000 c0047a7c c03b3f18 00000000 00000000 00000000 
dfc0: c03cdff4 c03cdfd0 c005eca8 c0047a8c ffffffff ffffffff 00000000 00000000 
dfe0: 00000000 00000000 00000000 c03cdff8 c004ba28 c005ebd0 00000000 00000000 
Backtrace: 
[<c0076254>] (__free_pages+0x0/0x58) from [<c003ad7c>] (free_pgd_slow+0x128/0x148)
[<c003ac54>] (free_pgd_slow+0x0/0x148) from [<c0045d54>] (__mmdrop+0x2c/0x48)
[<c0045d28>] (__mmdrop+0x0/0x48) from [<c0047b2c>] (desched_thread+0xb0/0x130)
 r4 = C03CC000 
[<c0047a7c>] (desched_thread+0x0/0x130) from [<c005eca8>] (kthread+0xe8/0x128)
[<c005ebc0>] (kthread+0x0/0x128) from [<c004ba28>] (do_exit+0x0/0x9cc)
 r7 = 00000000  r6 = 00000000  r5 = 00000000  r4 = 00000000
Code: e24cb004 e5903004 e1a0e001 e3530000 (05833000) 
 prev->state: 2 != TASK_RUNNING??
desched/0/11[CPU#0]: BUG in __schedule at kernel/sched.c:3826
Symptom 6:
VM: killing process sys_monitor
Bad pte = e1a0c00d, process = ???, vm_flags = 100077, vaddr = 12000
Bad pte = e1a0c00d, process = ???, vm_flags = 100077, vaddr = 17000
Bad pte = e1a00001, process = ???, vm_flags = 100077, vaddr = 1a000
Bad pte = e3a00001, process = ???, vm_flags = 100077, vaddr = 22000
Bad pte = e1a0c00d, process = ???, vm_flags = 100077, vaddr = 24000
Bad pte = e1a04003, process = ???, vm_flags = 100077, vaddr = 29000
Bad pte = e0821001, process = ???, vm_flags = 100077, vaddr = 2a000
Bad pte = 979ff101, process = ???, vm_flags = 100077, vaddr = 2c000
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 817 [#1]
Modules linked in: tlv320aic24 dm365_gpio dm365_pwm davinci_vpbe davinci_capture dm365_imp dm365mmap edmak irqk cmemk
CPU: 0
PC is at __free_pages+0x18/0x58
LR is at __init_begin+0x3fff8000/0x30
pc : [<c007626c>]    lr : [<00000000>]    Not tainted
sp : c434de98  ip : c434dea8  fp : c434dea4
r10: c02de000  r9 : c40b26e0  r8 : c02ca460
r7 : 00000000  r6 : 8434ffd1  r5 : c43c0000  r4 : c434c000
r3 : 00000000  r2 : c02ca444  r1 : 00000000  r0 : c03649e0
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  Segment user
Control: 5317F
Table: 8437C000  DAC: 00000015
Process sys_monitor (pid: 581, stack limit = 0xc434c258)
Stack: (0xc434de98 to 0xc434e000)
de80:                                                       c434decc c434dea8 
dea0: c003ad7c c0076264 c40b26e0 c40b26e0 c40b2714 c0495ac0 00000009 00008fa0 
dec0: c434dee4 c434ded0 c0045d54 c003ac64 c0495ac0 c40b26e0 c434defc c434dee8 
dee0: c0045e40 c0045d38 c0063250 c434c000 c434df1c c434df00 c004a28c c0045d80 
df00: c434c000 c0495ac0 c0495ac0 00000001 c434df3c c434df20 c004bbd8 c004a180 
df20: c434df84 c434df40 c00398ec c0049190 c434df84 c434df40 c00398f4 c004ba38 
df40: 00000001 00000000 be90fb28 00000000 c434dfb0 00000000 c434de58 ffffffff 
df60: 00000000 be90fb28 00000000 be90fba8 00000003 be90fc84 c434df9c c434df88 
df80: c00399fc c0039744 0000008e ffffffff c434dfac c434dfa0 c0039aac c00399f0 
dfa0: 00000000 c434dfb0 c0032d88 c0039aa4 00000000 be90fb28 00000000 00000000 
dfc0: be90fc90 00000000 be90fb28 00000000 be90fba8 00000003 be90fc84 00000004 
dfe0: 00000000 be90fb08 00008fa0 00008fa0 00000010 ffffffff 00000000 00000000
Backtrace: 
[<c0076254>] (__free_pages+0x0/0x58) from [<c003ad7c>] (free_pgd_slow+0x128/0x148)
[<c003ac54>] (free_pgd_slow+0x0/0x148) from [<c0045d54>] (__mmdrop+0x2c/0x48)
[<c0045d28>] (__mmdrop+0x0/0x48) from [<c0045e40>] (mmput+0xd0/0xdc)
 r4 = C40B26E0 
[<c0045d70>] (mmput+0x0/0xdc) from [<c004a28c>] (exit_mm+0x11c/0x120)
 r4 = C434C000 
[<c004a170>] (exit_mm+0x0/0x120) from [<c004bbd8>] (do_exit+0x1b0/0x9cc)
 r7 = 00000001  r6 = C0495AC0  r5 = C0495AC0  r4 = C434C000
[<c004ba28>] (do_exit+0x0/0x9cc) from [<c00398f4>] (do_page_fault+0x1c0/0x228)
[<c0039734>] (do_page_fault+0x0/0x228) from [<c00399fc>] (do_translation_fault+0x1c/0xb4)
[<c00399e0>] (do_translation_fault+0x0/0xb4) from [<c0039aac>] (do_PrefetchAbort+0x18/0x1c)
 r4 = FFFFFFFF 
[<c0039a94>] (do_PrefetchAbort+0x0/0x1c) from [<c0032d88>] (ret_from_exception+0x0/0x10)
Code: e24cb004 e5903004 e1a0e001 e3530000 (05833000) 
 <1>Fixing recursive fault but reboot is needed!
Thanks and Best Regards
Honeywell
Ivan Zhang(wenjie.zhang@honeywell.com)
Firmware Engineer - Honeywell Security R&D - Asia Pacific 
No.430 Li Bing Road, Zhang Jiang Hi-Tech.Park,
Pudong New Area,Shanghai, China(201203)
Tel：(8621)-28942292

[alsa-devel] report bug in kernel sound driver

bestzwj