[mvapich-discuss] mvapich-1.0(gen2) panic our IA64 cluster, but mvapich-1.0(tcp) not

=?gb2312?q?=C7=BF=20=C2=ED?= vera_wx_cn at yahoo.com.cn
Tue Sep 2 22:42:56 EDT 2008


Hello.
   
  My NAS programs run with mvapich-1.0(gen2) on IA64 cluster. Now  the kernel panic everytime, but run well with mvapich-1.0(tcp).
   
  ibstat show: 
  CA 'mthca0'
        CA type: MT25204
        Number of ports: 1
        Firmware version: 1.1.0
        Hardware version: a0
  panic information:
   
  Kernel panic - not syncing: arch/ia64/hp/common/sba_iommu.c: I/O MMU @ c0000000fed01000 is out of mapping resources
  kernel BUG at kernel/panic.c:75!
ft.C.4[3367]: bugcheck! 0 [1]
Modules linked in: blcr(U) blcr_vmadump(U) blcr_imports(U) nfs(U) lockd(U) nfs_acl(U) osc(U) mgc(U) lustre(U) lov(U) lquota(U) mdc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) rdma_cm(U) netconsole(U) ib_addr(U) netdump(U) md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) sunrpc(U) ib_ipoib(U) ds(U) yenta_socket(U) pcmcia_core(U) vfat(U) fat(U) dm_mirror(U) dm_multipath(U) dm_mod(U) button(U) ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) tg3(U) ext3(U) jbd(U) mptscsih(U) mptfc(U) mptsas(U) mptspi(U) mptscsi(U) mptbase(U) usb_storage(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) sd_mod(U) scsi_mod(U)
  Pid: 3367, CPU 3, comm:               ft.C.4
psr : 0000101008122030 ifs : 8000000000000814 ip  : [<a000000100077410>]    Tainted: GF    
ip is at panic+0x5f0/0x6a0
unat: 0000000000000000 pfs : 0000000000000814 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : fa0166a6855a59a9
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000100077410 b6  : a00000010025ebe0 b7  : a00000010025ebe0
f6  : 1003e00000000000000a0 f7  : 1003e0000000000000001
f8  : 1003e00000000000000a0 f9  : 10002a000000000000000
f10 : 0fffeb33333332fa80000 f11 : 1003e0000000000000000
r1  : a0000001009cc240 r2  : 000000000005bac7 r3  : a0000001007cc898
r8  : 0000000000000021 r9  : a0000001007df5b0 r10 : 0000000000000fff
r11 : 0000000000ffffff r12 : e00001006797fd40 r13 : e000010067978000
r14 : 0000000000004000 r15 : a000000100778bd8 r16 : 0000000000000001
r17 : a0000001007e0108 r18 : ffffffffffc66d68 r19 : a000000100611258
r20 : a000000100611248 r21 : a0000001007dbd68 r22 : e0000000066e0404
r23 : e0000000066e0380 r24 : 0000000000000002 r25 : 0000000000000002
r26 : e0000000066e03d4 r27 : 0000001008122030 r28 : e0000000066e03d4
r29 : a000000100669e28 r30 : 0000000000000000 r31 : a0000001007df588
  Call Trace:
 [<a000000100016da0>] show_stack+0x80/0xa0
                                sp=e00001006797f8b0 bsp=e000010067979470
 [<a0000001000176b0>] show_regs+0x890/0x8c0
                                sp=e00001006797fa80 bsp=e000010067979428
 [<a00000010003e8f0>] die+0x150/0x240
  sp=e00001006797faa0 bsp=e0000100679793e0
 [<a00000010003ea20>] die_if_kernel+0x40/0x60
                                sp=e00001006797faa0 bsp=e0000100679793b0
 [<a00000010003ebc0>] ia64_bad_break+0x180/0x600
                                sp=e00001006797faa0 bsp=e000010067979388
 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260
                                sp=e00001006797fb70 bsp=e000010067979388
 [<a000000100077410>] panic+0x5f0/0x6a0
                                sp=e00001006797fd40 bsp=e0000100679792e8
 [<a00000010045b5e0>] sba_alloc_range+0xa80/0x16e0
                                sp=e00001006797fda0 bsp=e000010067979278
 [<a00000010045d440>] sba_map_sg+0x380/0x760
                                sp=e00001006797fda0 bsp=e0000100679791e0
 [<a0000002002e74f0>] ib_umem_get+0x770/0xa80 [ib_uverbs]
                                sp=e00001006797fdb0 bsp=e000010067979120
 [<a0000002002de900>] ib_uverbs_reg_mr+0x2a0/0x9a0 [ib_uverbs]
                                sp=e00001006797fdb0 bsp=e0000100679790a8
 [<a0000002002da8b0>] ib_uverbs_write+0x210/0x280 [ib_uverbs]
                                sp=e00001006797fe10 bsp=e000010067979078
 [<a0000001001222d0>] vfs_write+0x290/0x360
                                  sp=e00001006797fe20 bsp=e000010067979028
 [<a0000001001224f0>] sys_write+0x70/0xe0
                                sp=e00001006797fe20 bsp=e000010067978fa8
 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e00001006797fe30 bsp=e000010067978fa8
 [<a000000000010640>] 0xa000000000010640
                                sp=e000010067980000 bsp=e000010067978fa8
  


       
---------------------------------
 ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080903/5dc0381e/attachment-0001.html


More information about the mvapich-discuss mailing list