<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<br>
<div class="moz-forward-container"><br>
<br>
-------- Forwarded Message --------
<table class="moz-email-headers-table" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<th valign="BASELINE" nowrap="nowrap" align="RIGHT">Subject:
</th>
<td>Re: [AstroIT #17096] presto:/data4</td>
</tr>
<tr>
<th valign="BASELINE" nowrap="nowrap" align="RIGHT">Date: </th>
<td>Tue, 1 Dec 2020 12:06:35 -0800</td>
</tr>
<tr>
<th valign="BASELINE" nowrap="nowrap" align="RIGHT">From: </th>
<td>Tim Pearson via RT <a class="moz-txt-link-rfc2396E" href="mailto:help@astro.caltech.edu"><help@astro.caltech.edu></a></td>
</tr>
<tr>
<th valign="BASELINE" nowrap="nowrap" align="RIGHT">Reply-To:
</th>
<td><a class="moz-txt-link-abbreviated" href="mailto:help@astro.caltech.edu">help@astro.caltech.edu</a></td>
</tr>
<tr>
<th valign="BASELINE" nowrap="nowrap" align="RIGHT">To: </th>
<td><a class="moz-txt-link-abbreviated" href="mailto:carlsmay@astro.caltech.edu">carlsmay@astro.caltech.edu</a>, <a class="moz-txt-link-abbreviated" href="mailto:pls@astro.caltech.edu">pls@astro.caltech.edu</a>,
<a class="moz-txt-link-abbreviated" href="mailto:vam@astro.caltech.edu">vam@astro.caltech.edu</a></td>
</tr>
</tbody>
</table>
<br>
<br>
Hi Patrick,<br>
<br>
I was expecting it might be slow; this is a test to find out how
slow. But I wasn't expecting the NFS mount to crash!<br>
<br>
Tim<br>
<br>
<blockquote type="cite">On Dec 1, 2020, at 11:21 AM, Patrick
Shopbell via RT <a class="moz-txt-link-rfc2396E" href="mailto:help@astro.caltech.edu"><help@astro.caltech.edu></a> wrote:<br>
<br>
<br>
Hi Tim,<br>
Have you ever tried that before?<br>
<br>
The NFS mount will not be fast, even over 10 Gbit. And it could
be<br>
that the I/O speeds on presto would be a limit too... I am not
sure<br>
if data4 and data8 use different RAID controllers, for example.<br>
--<br>
Patrick<br>
<br>
<br>
On 12/1/20 11:17 AM, Tim Pearson via RT wrote:<br>
<blockquote type="cite">I guess I was pounding on the link.
Usually I run our COMAP pipeline on the machine that is
directly connected to the disk, but today I have been running
it on the other machine.<br>
<br>
This is to see if we can run two pipelines in parallel, one on
allegro using /data4 and the other on presto using /data8. If
this is giving problems, then we will need to rethink our
strategy.<br>
<br>
Thanks<br>
<br>
Tim<br>
<br>
<blockquote type="cite">On Dec 1, 2020, at 11:04 AM, Patrick
Shopbell via RT <a class="moz-txt-link-rfc2396E" href="mailto:help@astro.caltech.edu"><help@astro.caltech.edu></a> wrote:<br>
<br>
<br>
Well, there are a lot of network timeouts this morning on
the<br>
10 gbit link between allegro and presto:<br>
<br>
Dec 1 09:45:18 allegro kernel: nfs: server presto-fast not
responding,<br>
still trying<br>
<br>
Mostly these were between 9:45 and 10:25 or so. Is it
possible<br>
that multiple users were pounding on the link heavily during
that<br>
time?<br>
<br>
I don't see any errors in the interfaces, and they are
synced at<br>
10 gbit speeds.<br>
<br>
It seems to be a very transient thing; there are no such
messages<br>
in the log files for the entire month of November.<br>
--<br>
Patrick<br>
<br>
<br>
<br>
On 12/1/20 10:51 AM, Tim Pearson via RT wrote:<br>
<blockquote type="cite">Hi Patrick<br>
<br>
Thanks! Do you know why it went away in the middle of my
job?<br>
<br>
Tim<br>
<br>
<blockquote type="cite">On Dec 1, 2020, at 10:30 AM,
Patrick Shopbell via RT <a class="moz-txt-link-rfc2396E" href="mailto:help@astro.caltech.edu"><help@astro.caltech.edu></a>
wrote:<br>
<br>
Hi all,<br>
I have reset the presto mount on allegro, so I think
this<br>
should be working now.<br>
--<br>
Patrick<br>
<br>
<br>
On Tue Dec 01 10:21:05 2020, <a class="moz-txt-link-abbreviated" href="mailto:rh@ovro.caltech.edu">rh@ovro.caltech.edu</a> wrote:<br>
<blockquote type="cite">Hi Anu and Patrick,<br>
<br>
Related to this are the URL links for data4 which the
comap and<br>
myself(/home/rh) can no longer access:<br>
<br>
The URL is used by COMAP's data viewer. All other
similar URL's work<br>
just fine.<br>
<br>
(base) [comap@presto backupScripts]$ curl<br>
<a class="moz-txt-link-freetext" href="http://presto.caltech.edu:88/static_pd4/">http://presto.caltech.edu:88/static_pd4/</a><br>
<html><br>
<head><title>403
Forbidden</title></head><br>
<body><br>
<center><h1>403
Forbidden</h1></center><br>
<hr><center>nginx/1.18.0</center><br>
</body><br>
</html><br>
<br>
<br>
-- rick<br>
<br>
On Tue, Dec 01, 2020 at 10:17:08AM -0800, Tim Pearson
wrote:<br>
<blockquote type="cite">Dear Anu and Patrick,<br>
<br>
I just ran into a problem on /data4:<br>
<br>
Traceback (most recent call last):<br>
File "run_level1.py", line 102, in <module><br>
run_level1(platform=args.platform, disk=args.disk,<br>
</blockquote>
month=args.month, outdisk=args.outdisk)<br>
<blockquote type="cite">File "run_level1.py", line 90,
in run_level1<br>
create_level1(dada, level1_dir, arc_dir,
'reglist.txt',<br>
</blockquote>
plotdir=plot_dir, database=True)<br>
<blockquote type="cite">File
"/home/comap/tjp/level1/create_level1.py", line 103,
in<br>
</blockquote>
create_level1<br>
<blockquote type="cite">(status, level1) =
dada_to_level1(dada_files, attrib,<br>
</blockquote>
output=output, check=check, verbose=verbose,
logfile=logfile)<br>
<blockquote type="cite">File
"/home/comap/tjp/level1/dada_tools.py", line 585, in<br>
</blockquote>
dada_to_level1<br>
<blockquote type="cite">hdf.close()<br>
File "/home/comap/tjp/level1p2/lib/python2.7/site-<br>
</blockquote>
packages/h5py/_hl/files.py", line 443, in close<br>
<blockquote type="cite">h5i.dec_ref(id_)<br>
File "h5py/_objects.pyx", line 54, in<br>
</blockquote>
h5py._objects.with_phil.wrapper<br>
<blockquote type="cite">File "h5py/_objects.pyx", line
55, in<br>
</blockquote>
h5py._objects.with_phil.wrapper<br>
<blockquote type="cite">File "h5py/h5i.pyx", line 150,
in h5py.h5i.dec_ref<br>
RuntimeError: Can't decrement id ref count (unable
to close file,<br>
</blockquote>
errno = 5, error message = 'Input/output error')<br>
<blockquote type="cite">./run_level1_data4.sh: line
15: 10496 Segmentation fault (core<br>
</blockquote>
dumped) python run_level1.py --disk /comapdata4
--outdisk /comapdata4<br>
--month 2019-01<br>
<blockquote type="cite">I think this is an NFS problem
as I am running the code on allegro<br>
</blockquote>
but /data4 is on presto. I can no longer see the
/data4 files from<br>
allegro.<br>
<blockquote type="cite">[comap@allegro level1]$ ls
/comapdata4/pathfinder/Backend/2019-<br>
</blockquote>
01/*_0000000000000000*.dada<br>
<blockquote type="cite">ls: cannot access
/comapdata4/pathfinder/Backend/2019-<br>
</blockquote>
01/*_0000000000000000*.dada: No such file or directory<br>
<blockquote type="cite">The same command works on
presto.<br>
<br>
Tim<br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<br>
--
*--------------------------------------------------------------------*<br>
| Patrick Shopbell Department of Astronomy |<br>
| <a class="moz-txt-link-abbreviated" href="mailto:pls@astro.caltech.edu">pls@astro.caltech.edu</a> Mail Code 249-17 |<br>
| (626) 395-4097 California Institute of Technology |<br>
| (626) 568-9352 (FAX) Pasadena, CA 91125 |<br>
| WWW: <a class="moz-txt-link-freetext" href="http://www.astro.caltech.edu/~pls/">http://www.astro.caltech.edu/~pls/</a> |<br>
*--------------------------------------------------------------------*<br>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
--
*--------------------------------------------------------------------*<br>
| Patrick Shopbell Department of Astronomy |<br>
| <a class="moz-txt-link-abbreviated" href="mailto:pls@astro.caltech.edu">pls@astro.caltech.edu</a> Mail Code 249-17 |<br>
| (626) 395-4097 California Institute of Technology |<br>
| (626) 568-9352 (FAX) Pasadena, CA 91125 |<br>
| WWW: <a class="moz-txt-link-freetext" href="http://www.astro.caltech.edu/~pls/">http://www.astro.caltech.edu/~pls/</a> |<br>
*--------------------------------------------------------------------*<br>
<br>
<br>
</blockquote>
<br>
<br>
</div>
</body>
</html>