Dear all
i want report a bug in davinci sound device driver (sound/soc/davinci-pcm.c)
This bug cause by DMA copy Overflow. It will cause kernel oops with a lot of unusual info.
And this problem seem still in latest stable kernel (version 2.6.35.7)
Bug Symptom at the end of mail
Here is my analyse of this bug:
Device will call function : davinci_pcm_new This function will malloc a lagre Continuous Pages buffer (Typicly:128K) both Playback and Capture. Those two buffer will use as DMA copy !
When someone recoder sound date ! This driver will use DMA.Copy register date to Capture buffer that malloc at function davinci_pcm_new !
every DMA copy finish.callback function davinci_pcm_dma_irq will run. function davinci_pcm_enqueue_dma will work.This function will
set DMA copy params again. And problem is in here !
It set DMA params :
src = sound recoder 32-bit reg address dst = prtd->period * period_size src_bidx = 0 //(Every dma copy finish the src will not change) dst_bidx = data_type; //date_type = 2, because only high 16-bit is the sound date. acnt = 4 bcnt = 2048 cnt = 1
Use this param.DMA Internal work like:
for(c=0;c<cnt;c++) { for(b=0;b<bcnt;b++) { memcopy(&dst,&src,4) src += src_bidx; //src_bidx = 0; dst += dst_bidx //dst_bidx = data_type =2 16bit sound date } }
This copy will make all dst buffer has source high 16 bit date. but will cause 2 byptes Overflow
every time the dma copy finish. it will change 4K bytes + 2 bytes. The 2 bytes is DMA copy Overflow.
it will not error until you copy to the last period! because. your date total copy 128K +2bytes . and we only malloc 128K bytes
other 2 bytes is kernel space memory. this two bytes will be use random by kernel. And those 2 bytes copy by dma. kernel don't know
anything about this segment default.
This easy way to fix the problem is change: if(unlikely(prtd->period >= runtime->periods)) prtd->period = 0;
In function davinci_pcm_enqueue_dma to: if(unlikely(prtd->period >= (runtime->periods-1))) prtd->period = 0;
Below is the Symptom:
Symptom 1: Bad pte = 04040202, process = sleep, vm_flags = 1875, vaddr = 1b000 VM: killing process sleep Bad pte = 04040601, process = ???, vm_flags = 1875, vaddr = 17000 Bad pte = ffffffff, process = ???, vm_flags = 1875, vaddr = 43000 Bad pte = 00000001, process = ???, vm_flags = 1875, vaddr = 44000 ………….. Bad pte = 00000001, process = ???, vm_flags = 1875, vaddr = 88000
Symptom 2:
Unhandled fault: page domain fault (0x8fb) at 0x00011008 Internal error: : 8fb [#1] Modules linked in: tlv320aic24 dm365_gpio dm365_pwm davinci_vpbe davinci_capture dm365_imp dm365mmap edmak irqk cmemk CPU: 0 PC is at __copy_to_user+0x54/0x3a8 LR is at 0x5eff968 pc : [<c0117568>] lr : [<05eff968>] Not tainted sp : c436befc ip : e4640f80 fp : c436bf4c r10: 00000000 r9 : c436a000 r8 : dcfd0362 r7 : 0ee2fab7 r6 : f7a60e69 r5 : fe9cf7d3 r4 : 026603c7 r3 : 0b7de3b1 r2 : 00000760 r1 : c5056020 r0 : 00011008 Flags: nzCv IRQs on FIQs on Mode SVC_32 Segment user Control: 5317F Table: 843C0000 DAC: 00000015 ,,,,,,,,,,,,,,,,,,,,,,,, page:c0363be0 flags:0x00000068 mapping:c4273d18 mapcount:0 count:0 Trying to fix it up, but a reboot is needed
Symptom 3:
159.99.249.249 login: VM: killing process video_test Bad pte = 00000003, process = ???, vm_flags = 1875, vaddr = 9000 Bad pte = 00000005, process = ???, vm_flags = 1875, vaddr = b000 ,,,,,,,,,,,,,,,,,,,,, Bad pte = 00000001, process = ???, vm_flags = 100077, vaddr = 31000 Bad page state in process 'desched/0' page:c035e3e0 flags:0x0000006c mapping:c06ecec8 mapcount:0 count:0 Trying to fix it up, but a reboot is needed
Symptom 4:
159.99.249.249 login: Bad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000 Stopping interneBad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000 t superserver: iBad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000 netdBad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000 Bad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000 Bad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000 ,,,,,,,,,,,,,,,,,,,,,,,,,,,, Bad pte = ffb7ffb6, process = inetd, vm_flags = 100177, vaddr = bea82000
Symptom 5:
Unable to handle kernel NULL pointer dereference at virtual address 00000000 done. pgd = c0004000 [00000000] *pgd=00000000 Internal error: Oops: 817 [#1] Modules linked in: tlv320aic24 dm365_gpio dm365_pwm davinci_vpbe davinci_capture dm365_imp dm365mmap edmak irqk cmemk CPU: 0 PC is at __free_pages+0x18/0x58 LR is at __init_begin+0x3fff8000/0x30 pc : [<c007626c>] lr : [<00000000>] Not tainted sp : c03cdf50 ip : c03cdf60 fp : c03cdf5c r10: c02de000 r9 : 00000002 r8 : c02ca460 r7 : 00000000 r6 : 843cffd0 r5 : c43c0000 r4 : c03cc000 r3 : 00000000 r2 : c02ca444 r1 : 00000000 r0 : c03659e0 Flags: nZCv IRQs on FIQs on Mode SVC_32 Segment kernel Control: 5317F Table: 805BC000 DAC: 00000017 Process desched/0 (pid: 11, stack limit = 0xc03cc258) Stack: (0xc03cdf50 to 0xc03ce000) df40: c03cdf84 c03cdf60 c003ad7c c0076264 df60: c002b6c0 c002b6c0 00000000 c02c2990 00000001 c02c2998 c03cdf9c c03cdf88 df80: c0045d54 c003ac64 c03b3f18 c03cc000 c03cdfcc c03cdfa0 c0047b2c c0045d38 dfa0: 00000000 00000000 c03cc000 c0047a7c c03b3f18 00000000 00000000 00000000 dfc0: c03cdff4 c03cdfd0 c005eca8 c0047a8c ffffffff ffffffff 00000000 00000000 dfe0: 00000000 00000000 00000000 c03cdff8 c004ba28 c005ebd0 00000000 00000000 Backtrace: [<c0076254>] (__free_pages+0x0/0x58) from [<c003ad7c>] (free_pgd_slow+0x128/0x148) [<c003ac54>] (free_pgd_slow+0x0/0x148) from [<c0045d54>] (__mmdrop+0x2c/0x48) [<c0045d28>] (__mmdrop+0x0/0x48) from [<c0047b2c>] (desched_thread+0xb0/0x130) r4 = C03CC000 [<c0047a7c>] (desched_thread+0x0/0x130) from [<c005eca8>] (kthread+0xe8/0x128) [<c005ebc0>] (kthread+0x0/0x128) from [<c004ba28>] (do_exit+0x0/0x9cc) r7 = 00000000 r6 = 00000000 r5 = 00000000 r4 = 00000000 Code: e24cb004 e5903004 e1a0e001 e3530000 (05833000) prev->state: 2 != TASK_RUNNING?? desched/0/11[CPU#0]: BUG in __schedule at kernel/sched.c:3826
Symptom 6: VM: killing process sys_monitor Bad pte = e1a0c00d, process = ???, vm_flags = 100077, vaddr = 12000 Bad pte = e1a0c00d, process = ???, vm_flags = 100077, vaddr = 17000 Bad pte = e1a00001, process = ???, vm_flags = 100077, vaddr = 1a000 Bad pte = e3a00001, process = ???, vm_flags = 100077, vaddr = 22000 Bad pte = e1a0c00d, process = ???, vm_flags = 100077, vaddr = 24000 Bad pte = e1a04003, process = ???, vm_flags = 100077, vaddr = 29000 Bad pte = e0821001, process = ???, vm_flags = 100077, vaddr = 2a000 Bad pte = 979ff101, process = ???, vm_flags = 100077, vaddr = 2c000 Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c0004000 [00000000] *pgd=00000000 Internal error: Oops: 817 [#1] Modules linked in: tlv320aic24 dm365_gpio dm365_pwm davinci_vpbe davinci_capture dm365_imp dm365mmap edmak irqk cmemk CPU: 0 PC is at __free_pages+0x18/0x58 LR is at __init_begin+0x3fff8000/0x30 pc : [<c007626c>] lr : [<00000000>] Not tainted sp : c434de98 ip : c434dea8 fp : c434dea4 r10: c02de000 r9 : c40b26e0 r8 : c02ca460 r7 : 00000000 r6 : 8434ffd1 r5 : c43c0000 r4 : c434c000 r3 : 00000000 r2 : c02ca444 r1 : 00000000 r0 : c03649e0 Flags: nZCv IRQs on FIQs on Mode SVC_32 Segment user Control: 5317F Table: 8437C000 DAC: 00000015 Process sys_monitor (pid: 581, stack limit = 0xc434c258) Stack: (0xc434de98 to 0xc434e000) de80: c434decc c434dea8 dea0: c003ad7c c0076264 c40b26e0 c40b26e0 c40b2714 c0495ac0 00000009 00008fa0 dec0: c434dee4 c434ded0 c0045d54 c003ac64 c0495ac0 c40b26e0 c434defc c434dee8 dee0: c0045e40 c0045d38 c0063250 c434c000 c434df1c c434df00 c004a28c c0045d80 df00: c434c000 c0495ac0 c0495ac0 00000001 c434df3c c434df20 c004bbd8 c004a180 df20: c434df84 c434df40 c00398ec c0049190 c434df84 c434df40 c00398f4 c004ba38 df40: 00000001 00000000 be90fb28 00000000 c434dfb0 00000000 c434de58 ffffffff df60: 00000000 be90fb28 00000000 be90fba8 00000003 be90fc84 c434df9c c434df88 df80: c00399fc c0039744 0000008e ffffffff c434dfac c434dfa0 c0039aac c00399f0 dfa0: 00000000 c434dfb0 c0032d88 c0039aa4 00000000 be90fb28 00000000 00000000 dfc0: be90fc90 00000000 be90fb28 00000000 be90fba8 00000003 be90fc84 00000004 dfe0: 00000000 be90fb08 00008fa0 00008fa0 00000010 ffffffff 00000000 00000000
Backtrace: [<c0076254>] (__free_pages+0x0/0x58) from [<c003ad7c>] (free_pgd_slow+0x128/0x148) [<c003ac54>] (free_pgd_slow+0x0/0x148) from [<c0045d54>] (__mmdrop+0x2c/0x48) [<c0045d28>] (__mmdrop+0x0/0x48) from [<c0045e40>] (mmput+0xd0/0xdc) r4 = C40B26E0 [<c0045d70>] (mmput+0x0/0xdc) from [<c004a28c>] (exit_mm+0x11c/0x120) r4 = C434C000 [<c004a170>] (exit_mm+0x0/0x120) from [<c004bbd8>] (do_exit+0x1b0/0x9cc) r7 = 00000001 r6 = C0495AC0 r5 = C0495AC0 r4 = C434C000 [<c004ba28>] (do_exit+0x0/0x9cc) from [<c00398f4>] (do_page_fault+0x1c0/0x228) [<c0039734>] (do_page_fault+0x0/0x228) from [<c00399fc>] (do_translation_fault+0x1c/0xb4) [<c00399e0>] (do_translation_fault+0x0/0xb4) from [<c0039aac>] (do_PrefetchAbort+0x18/0x1c) r4 = FFFFFFFF [<c0039a94>] (do_PrefetchAbort+0x0/0x1c) from [<c0032d88>] (ret_from_exception+0x0/0x10) Code: e24cb004 e5903004 e1a0e001 e3530000 (05833000) <1>Fixing recursive fault but reboot is needed!
Thanks and Best Regards
Honeywell Ivan Zhang(wenjie.zhang@honeywell.com) Firmware Engineer - Honeywell Security R&D - Asia Pacific No.430 Li Bing Road, Zhang Jiang Hi-Tech.Park, Pudong New Area,Shanghai, China(201203) Tel:(8621)-28942292