distributed processing too many files?

Discussion in 'Cadence' started by Stefano Zanella, Apr 25, 2006.

  1. Hi,

    Has anybody experienced problems with LBS when the number of files in
    the netlist directory is around 1000? I haven't exactly nailed down the
    number of files that causes problems, but it is certainly less than 1024
    (or 2^10 related numbers). The symptoms are the following: not a
    single file is copied on the jobXXX directories and spectre fails (no
    input file found). I am running in ocean and basically doing

    foreach job {

    do some stuff
    run()

    }

    wait

    It works beautifully for run directories that contain less that circa
    1000 files and doesn't for more files. The problem seems to be
    independent on the design (tried different designs) and on the size of
    the run directory. I have all the log levels set to the maximum, but
    they don't say anything.

    Thanks in advance,
    Stefano
     
    Stefano Zanella, Apr 25, 2006
    #1
  2. Stefano Zanella

    raman Guest

    Hi Stefano,

    The issue is probably filesystem related, here is the solution for
    Linux:

    The value in file-max denotes the maximum number of file handles
    that the Linux kernel will allocate. When you get a lot of error
    messages about running out of file handles, you might want to raise
    this limit. The default value is 4096. To change it, just write the
    new number into the file:

    # cat /proc/sys/fs/file-max
    4096
    # echo 8192 > /proc/sys/fs/file-max
    # cat /proc/sys/fs/file-max
    8192

    [...]

    The value in inode-max denotes the maximum number of inode
    handlers. This value should be 3 to 4 times larger than the value
    in file-max, since stdin, stdout, and network sockets also need an
    inode struct to handle them. If you regularly run out of inodes,
    you should increase this value.

    Regards
    Raman
     
    raman, Apr 25, 2006
    #2
  3. Hi Raman,

    Thanks a lot. Unfortunately that does not seem to be the case:

    sh-2.05a$ cat /proc/sys/fs/file-max
    104802

    I checked it on the LBS server and on all client machines. I did not get
    any error messages at all from LBS, which is the worrysome part. I am
    wondering whether there is a hard-coded limit somewhere.

    Regards,
    Stefano


     
    Stefano Zanella, Apr 25, 2006
    #3
  4. Stefano Zanella

    raman Guest

    Hi Stefano,

    You might want to check if there is any quota set for the user
    account, and also
    the remote /tmp directories(if there is any issues). You could also
    try running
    the job as root to rule out any user specific limits(eventhough is not
    advised).
    The following .cdsenv setting might be helpful:
    asimenv.distributed copyMode boolean nil
    I do suspect that it could be a system issue, esp. when the logs
    beccome useless.

    Regards
    Raman
     
    raman, Apr 25, 2006
    #4
  5. Hi Raman,

    Thanks a lot (again!). It is not a size issue. I can use a test case
    that is 100 times as big (in terms of data size) with less files and
    everything will be ok. I can't try the root option (my IT will never
    allow me).

    asimenv.distributed copyMode is already nil. I guess that next step is
    cadence's support.

    Regards,
    Stefano
     
    Stefano Zanella, Apr 25, 2006
    #5
  6. nope, just few jobs.
    Stefano
     
    Stefano Zanella, Apr 26, 2006
    #6
  7. Stefano Zanella

    satya Guest

    Stefano

    There are a zillion things with limits in an UNIX environment.
    In your case a likely suspect is the number of open files,

    /usr/sbin/lsof -p `pgrep icfb.exe` | wc -l

    will tell you how many files icfb.exe has open.

    You can check this against the limit: ulimit -n

    To find out all the limits, type ulimit -a . Mind you, these are per
    process limits. There are also per system limits as Raman points out.

    Hope this helps.

    Satya
     
    satya, Apr 26, 2006
    #7
Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.