[mvapich-discuss] MVAPICH job fails with 'Unexpected End-Of-File on file descriptor' error

Hari Subramoni subramoni.1 at osu.edu
Tue Aug 12 08:09:06 EDT 2014


Hi Chaitra,

Please set "ulimit - c unlimited" to prevent the core files getting
truncated. You need to add this to your bashrc / bashprofile so that it
gets loaded when the program is launched.

Regards,
Hari.


On Mon, Aug 11, 2014 at 10:14 PM, Chaitra Kumar <chaitragkumar at gmail.com>
wrote:

> Hi Team,
>
>
>
> I am trying to run Graph500 on MVAPICH2.  I am using infiniband. It works
> for smaller number of cores.  But when I increase the number of cores it
> crashes.
>
> I have configured MVAPICH2 in debug mode .
>
> ./configure  --enable-cxx --enable-threads=multiple --with-device=ch3:mrail --with-rdma=gen2 *--disable-fast --enable-g=all --enable-error-messages=all*
>
>
>
> The command I am using is:
>
> mpirun_rsh -np 72 -hostfile hostfile MV2_DEBUG_CORESIZE=unlimited
> MV2_DEBUG_SHOW_BACKTRACE=1  MV2_ENABLE_AFFINITY=0
> ./graph500_mpi_custom_72 28
>
>
>
> But the core dump generated is getting truncated so it could not be read.
>
>
>
> gdb  graph500_mpi_custom_72 core.132419
>
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-64.el6_5.2)
>
> Copyright (C) 2010 Free Software Foundation, Inc.
>
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
>
> This is free software: you are free to change and redistribute it.
>
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>
> and "show warranty" for details.
>
> This GDB was configured as "x86_64-redhat-linux-gnu".
>
> For bug reporting instructions, please see:
>
> <http://www.gnu.org/software/gdb/bugs/>...
>
> Reading symbols from
> /home/padmanac/graph/mpi/graph500_mpi_custom_72...done.
>
> BFD: Warning: /home/padmanac/graph/mpi/core.132419 is truncated: expected
> core file size >= 2893844480, found: 1149763584.
>
>
>
> warning: core file may not match specified executable file.
>
> [New Thread 132419]
>
> Cannot access memory at address 0x7f12c350d760
>
> (gdb) bt
>
> #0  0x00007f12c27ce6ea in ?? ()
>
> Cannot access memory at address 0x7fff4ff06390
>
>
>
>
>
> The logs has following information:
>
> [polaris-1:mpispawn_0][child_handler] MPI process (rank: 1, pid: 136169)
> termina
>
> ted with signal 11 -> abort job
>
> [polaris-1:mpispawn_0][readline] Unexpected End-Of-File on file descriptor
> 15. M
>
> PI process died?
>
> [polaris-1:mpispawn_0][mtpmi_processops] Error while reading PMI socket.
> MPI pro
>
> cess died?
>
> [polaris-1:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node
> polaris
>
> -1 aborted: MPI process error (1)
>
>
>
>
>
> ibstat output:
>
> ibstat
>
> CA 'mlx4_0'
>
>         CA type: MT26428
>
>         Number of ports: 2
>
>         Firmware version: 2.9.1200
>
>         Hardware version: b0
>
>         Node GUID: 0x0002c9030028078c
>
>         System image GUID: 0x0002c9030028078f
>
>         Port 1:
>
>                 State: Active
>
>                 Physical state: LinkUp
>
>                 Rate: 40
>
>                 Base lid: 2
>
>                 LMC: 0
>
>                 SM lid: 1
>
>                 Capability mask: 0x0251086a
>
>                 Port GUID: 0x0002c9030028078d
>
>                 Link layer: InfiniBand
>
>         Port 2:
>
>                 State: Active
>
>                 Physical state: LinkUp
>
>                 Rate: 40
>
>                 Base lid: 3
>
>                 LMC: 0
>
>                 SM lid: 1
>
>                 Capability mask: 0x02510868
>
>                 Port GUID: 0x0002c9030028078e
>
>                 Link layer: InfiniBand
>
>
>
>
>
>
> Please help me in resolving this problem.  Thanks in advance.
>
> Regards,
> Chaitra
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140812/ab76fa30/attachment-0001.html>


More information about the mvapich-discuss mailing list