<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <br>
    <div class="moz-forward-container"><br>
      <br>
      -------- Forwarded Message --------
      <table class="moz-email-headers-table" cellspacing="0" cellpadding="0" border="0">
        <tbody>
          <tr>
            <th valign="BASELINE" nowrap="nowrap" align="RIGHT">Subject:
            </th>
            <td>Re: [AstroIT #17096] presto:/data4</td>
          </tr>
          <tr>
            <th valign="BASELINE" nowrap="nowrap" align="RIGHT">Date: </th>
            <td>Tue, 1 Dec 2020 12:06:35 -0800</td>
          </tr>
          <tr>
            <th valign="BASELINE" nowrap="nowrap" align="RIGHT">From: </th>
            <td>Tim Pearson via RT <a class="moz-txt-link-rfc2396E" href="mailto:help@astro.caltech.edu"><help@astro.caltech.edu></a></td>
          </tr>
          <tr>
            <th valign="BASELINE" nowrap="nowrap" align="RIGHT">Reply-To:
            </th>
            <td><a class="moz-txt-link-abbreviated" href="mailto:help@astro.caltech.edu">help@astro.caltech.edu</a></td>
          </tr>
          <tr>
            <th valign="BASELINE" nowrap="nowrap" align="RIGHT">To: </th>
            <td><a class="moz-txt-link-abbreviated" href="mailto:carlsmay@astro.caltech.edu">carlsmay@astro.caltech.edu</a>, <a class="moz-txt-link-abbreviated" href="mailto:pls@astro.caltech.edu">pls@astro.caltech.edu</a>,
              <a class="moz-txt-link-abbreviated" href="mailto:vam@astro.caltech.edu">vam@astro.caltech.edu</a></td>
          </tr>
        </tbody>
      </table>
      <br>
      <br>
      Hi Patrick,<br>
      <br>
      I was expecting it might be slow; this is a test to find out how
      slow. But I wasn't expecting the NFS mount to crash!<br>
      <br>
      Tim<br>
      <br>
      <blockquote type="cite">On Dec 1, 2020, at 11:21 AM, Patrick
        Shopbell via RT <a class="moz-txt-link-rfc2396E" href="mailto:help@astro.caltech.edu"><help@astro.caltech.edu></a> wrote:<br>
        <br>
        <br>
        Hi Tim,<br>
        Have you ever tried that before?<br>
        <br>
        The NFS mount will not be fast, even over 10 Gbit. And it could
        be<br>
        that the I/O speeds on presto would be a limit too... I am not
        sure<br>
        if data4 and data8 use different RAID controllers, for example.<br>
        --<br>
        Patrick<br>
        <br>
        <br>
        On 12/1/20 11:17 AM, Tim Pearson via RT wrote:<br>
        <blockquote type="cite">I guess I was pounding on the link.
          Usually I run our COMAP pipeline on the machine that is
          directly connected to the disk, but today I have been running
          it on the other machine.<br>
          <br>
          This is to see if we can run two pipelines in parallel, one on
          allegro using /data4 and the other on presto using /data8. If
          this is giving problems, then we will need to rethink our
          strategy.<br>
          <br>
          Thanks<br>
          <br>
          Tim<br>
          <br>
          <blockquote type="cite">On Dec 1, 2020, at 11:04 AM, Patrick
            Shopbell via RT <a class="moz-txt-link-rfc2396E" href="mailto:help@astro.caltech.edu"><help@astro.caltech.edu></a> wrote:<br>
            <br>
            <br>
            Well, there are a lot of network timeouts this morning on
            the<br>
            10 gbit link between allegro and presto:<br>
            <br>
            Dec 1 09:45:18 allegro kernel: nfs: server presto-fast not
            responding,<br>
            still trying<br>
            <br>
            Mostly these were between 9:45 and 10:25 or so. Is it
            possible<br>
            that multiple users were pounding on the link heavily during
            that<br>
            time?<br>
            <br>
            I don't see any errors in the interfaces, and they are
            synced at<br>
            10 gbit speeds.<br>
            <br>
            It seems to be a very transient thing; there are no such
            messages<br>
            in the log files for the entire month of November.<br>
            --<br>
            Patrick<br>
            <br>
            <br>
            <br>
            On 12/1/20 10:51 AM, Tim Pearson via RT wrote:<br>
            <blockquote type="cite">Hi Patrick<br>
              <br>
              Thanks! Do you know why it went away in the middle of my
              job?<br>
              <br>
              Tim<br>
              <br>
              <blockquote type="cite">On Dec 1, 2020, at 10:30 AM,
                Patrick Shopbell via RT <a class="moz-txt-link-rfc2396E" href="mailto:help@astro.caltech.edu"><help@astro.caltech.edu></a>
                wrote:<br>
                <br>
                Hi all,<br>
                I have reset the presto mount on allegro, so I think
                this<br>
                should be working now.<br>
                --<br>
                Patrick<br>
                <br>
                <br>
                On Tue Dec 01 10:21:05 2020, <a class="moz-txt-link-abbreviated" href="mailto:rh@ovro.caltech.edu">rh@ovro.caltech.edu</a> wrote:<br>
                <blockquote type="cite">Hi Anu and Patrick,<br>
                  <br>
                  Related to this are the URL links for data4 which the
                  comap and<br>
                  myself(/home/rh) can no longer access:<br>
                  <br>
                  The URL is used by COMAP's data viewer. All other
                  similar URL's work<br>
                  just fine.<br>
                  <br>
                  (base) [comap@presto backupScripts]$ curl<br>
                  <a class="moz-txt-link-freetext" href="http://presto.caltech.edu:88/static_pd4/">http://presto.caltech.edu:88/static_pd4/</a><br>
                  <html><br>
                  <head><title>403
                  Forbidden</title></head><br>
                  <body><br>
                  <center><h1>403
                  Forbidden</h1></center><br>
                  <hr><center>nginx/1.18.0</center><br>
                  </body><br>
                  </html><br>
                  <br>
                  <br>
                  -- rick<br>
                  <br>
                  On Tue, Dec 01, 2020 at 10:17:08AM -0800, Tim Pearson
                  wrote:<br>
                  <blockquote type="cite">Dear Anu and Patrick,<br>
                    <br>
                    I just ran into a problem on /data4:<br>
                    <br>
                    Traceback (most recent call last):<br>
                    File "run_level1.py", line 102, in <module><br>
                    run_level1(platform=args.platform, disk=args.disk,<br>
                  </blockquote>
                  month=args.month, outdisk=args.outdisk)<br>
                  <blockquote type="cite">File "run_level1.py", line 90,
                    in run_level1<br>
                    create_level1(dada, level1_dir, arc_dir,
                    'reglist.txt',<br>
                  </blockquote>
                  plotdir=plot_dir, database=True)<br>
                  <blockquote type="cite">File
                    "/home/comap/tjp/level1/create_level1.py", line 103,
                    in<br>
                  </blockquote>
                  create_level1<br>
                  <blockquote type="cite">(status, level1) =
                    dada_to_level1(dada_files, attrib,<br>
                  </blockquote>
                  output=output, check=check, verbose=verbose,
                  logfile=logfile)<br>
                  <blockquote type="cite">File
                    "/home/comap/tjp/level1/dada_tools.py", line 585, in<br>
                  </blockquote>
                  dada_to_level1<br>
                  <blockquote type="cite">hdf.close()<br>
                    File "/home/comap/tjp/level1p2/lib/python2.7/site-<br>
                  </blockquote>
                  packages/h5py/_hl/files.py", line 443, in close<br>
                  <blockquote type="cite">h5i.dec_ref(id_)<br>
                    File "h5py/_objects.pyx", line 54, in<br>
                  </blockquote>
                  h5py._objects.with_phil.wrapper<br>
                  <blockquote type="cite">File "h5py/_objects.pyx", line
                    55, in<br>
                  </blockquote>
                  h5py._objects.with_phil.wrapper<br>
                  <blockquote type="cite">File "h5py/h5i.pyx", line 150,
                    in h5py.h5i.dec_ref<br>
                    RuntimeError: Can't decrement id ref count (unable
                    to close file,<br>
                  </blockquote>
                  errno = 5, error message = 'Input/output error')<br>
                  <blockquote type="cite">./run_level1_data4.sh: line
                    15: 10496 Segmentation fault (core<br>
                  </blockquote>
                  dumped) python run_level1.py --disk /comapdata4
                  --outdisk /comapdata4<br>
                  --month 2019-01<br>
                  <blockquote type="cite">I think this is an NFS problem
                    as I am running the code on allegro<br>
                  </blockquote>
                  but /data4 is on presto. I can no longer see the
                  /data4 files from<br>
                  allegro.<br>
                  <blockquote type="cite">[comap@allegro level1]$ ls
                    /comapdata4/pathfinder/Backend/2019-<br>
                  </blockquote>
                  01/*_0000000000000000*.dada<br>
                  <blockquote type="cite">ls: cannot access
                    /comapdata4/pathfinder/Backend/2019-<br>
                  </blockquote>
                  01/*_0000000000000000*.dada: No such file or directory<br>
                  <blockquote type="cite">The same command works on
                    presto.<br>
                    <br>
                    Tim<br>
                  </blockquote>
                </blockquote>
              </blockquote>
            </blockquote>
            <br>
            --
            *--------------------------------------------------------------------*<br>
            | Patrick Shopbell Department of Astronomy |<br>
            | <a class="moz-txt-link-abbreviated" href="mailto:pls@astro.caltech.edu">pls@astro.caltech.edu</a> Mail Code 249-17 |<br>
            | (626) 395-4097 California Institute of Technology |<br>
            | (626) 568-9352 (FAX) Pasadena, CA 91125 |<br>
            | WWW: <a class="moz-txt-link-freetext" href="http://www.astro.caltech.edu/~pls/">http://www.astro.caltech.edu/~pls/</a> |<br>
*--------------------------------------------------------------------*<br>
            <br>
            <br>
          </blockquote>
          <br>
        </blockquote>
        <br>
        <br>
        --
        *--------------------------------------------------------------------*<br>
        | Patrick Shopbell Department of Astronomy |<br>
        | <a class="moz-txt-link-abbreviated" href="mailto:pls@astro.caltech.edu">pls@astro.caltech.edu</a> Mail Code 249-17 |<br>
        | (626) 395-4097 California Institute of Technology |<br>
        | (626) 568-9352 (FAX) Pasadena, CA 91125 |<br>
        | WWW: <a class="moz-txt-link-freetext" href="http://www.astro.caltech.edu/~pls/">http://www.astro.caltech.edu/~pls/</a> |<br>
*--------------------------------------------------------------------*<br>
        <br>
        <br>
      </blockquote>
      <br>
      <br>
    </div>
  </body>
</html>