The uploads issue seems to have been fixed for the moment since my problem detection script is no longer reporting long lists of machines that have stuck uploads. A quick look at a couple of individual hosts shows them clear of these failures.
Nicely in time for the control script to whip around the whole fleet and prod them into syncing their data files and then refilling their work caches before something decides to stop the uploads once again :-).
Had upload problems earlier likely attributed to this, but also seem to be getting some download issues since I've had 3 different hosts on my network have computation errors as noted below:
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
The name limit for the local computer network adapter card was exceeded.
(0x44) - exit code 68 (0x44)</message>
<stderr_txt>
04:18:59 (8544): [normal]: This Einstein@home App was built at: Jul 26 2017 09:32:43
04:18:59 (8544): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_windows_intelx86__FGRPSSE.exe'.
04:18:59 (8544): [debug]: 2.1e+015 fp, 5.8e+009 fp/s, 363852 s, 101h04m12s02
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_windows_intelx86__FGRPSSE.exe --inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 --alpha 2.1039176188 --delta -0.9808959836 --skyRadius 0.001361356817 --ldiBins 15 --f0start 1080 --f0Band 16 --firstSkyPoint 304500 --numSkyPoints 58 --f1dot -1.0e-13 --f1dotBand 1.0e-13 --df1dot 1.344493449e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 4194304.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 56757.0 --f0orbit 0.005 --freeRadiusFactor 2 --mismatch 0.15 --debug 0 -o LATeah1075F_1096.0_304500_0.0_1_0.out
output files: 'LATeah1075F_1096.0_304500_0.0_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah1075F_1096.0_304500_0.0_1_0' 'LATeah1075F_1096.0_304500_0.0_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah1075F_1096.0_304500_0.0_1_1'
04:18:59 (8544): [debug]: Flags: i386 SSE GNUC X86 GNUX86
04:18:59 (8544): [debug]: Set up communication with graphics process.
Line 1 in inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 seems to be damaged.
04:18:59 (8544): [CRITICAL]: ERROR: MAIN() returned with error '4'
FPU status flags: PRECISION
04:19:10 (8544): [normal]: done. calling boinc_finish(68).
04:19:10 (8544): called boinc_finish
Had upload problems earlier likely attributed to this, but also seem to be getting some download issues since I've had 3 different hosts on my network have computation errors as noted below:
Welcome to E@H !
)
Welcome to E@H !
Could be you're getting extra
)
Could be you're getting extra load since GPUGRID is temporarily shut down.
... that is what I already
)
... that is what I already said in message 181767 ...
.. fine, but we are not
)
.. fine, but we are not getting any new tasks ...
Downloads for new hosts seem
)
Downloads for new hosts seem to be problematic as well...
@Bernd Is it worthwhile
)
@Bernd Is it worthwhile giving the BRP4's their own upload server? Uploads are fairly small and it would take some load off the main upload server.
MarksRpiCluster
The uploads issue seems to
)
The uploads issue seems to have been fixed for the moment since my problem detection script is no longer reporting long lists of machines that have stuck uploads. A quick look at a couple of individual hosts shows them clear of these failures.
Nicely in time for the control script to whip around the whole fleet and prod them into syncing their data files and then refilling their work caches before something decides to stop the uploads once again :-).
Cheers,
Gary.
Had upload problems earlier
)
Had upload problems earlier likely attributed to this, but also seem to be getting some download issues since I've had 3 different hosts on my network have computation errors as noted below:
Validation state:Invalid
Granted credit:0
Application:Gamma-ray pulsar search #5 v1.08 (FGRPSSE)
windows_intelx86
Stderr output
04:18:59 (8544): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_windows_intelx86__FGRPSSE.exe'.
04:18:59 (8544): [debug]: 2.1e+015 fp, 5.8e+009 fp/s, 363852 s, 101h04m12s02
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_windows_intelx86__FGRPSSE.exe --inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 --alpha 2.1039176188 --delta -0.9808959836 --skyRadius 0.001361356817 --ldiBins 15 --f0start 1080 --f0Band 16 --firstSkyPoint 304500 --numSkyPoints 58 --f1dot -1.0e-13 --f1dotBand 1.0e-13 --df1dot 1.344493449e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 4194304.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 56757.0 --f0orbit 0.005 --freeRadiusFactor 2 --mismatch 0.15 --debug 0 -o LATeah1075F_1096.0_304500_0.0_1_0.out
output files: 'LATeah1075F_1096.0_304500_0.0_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah1075F_1096.0_304500_0.0_1_0' 'LATeah1075F_1096.0_304500_0.0_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah1075F_1096.0_304500_0.0_1_1'
04:18:59 (8544): [debug]: Flags: i386 SSE GNUC X86 GNUX86
04:18:59 (8544): [debug]: Set up communication with graphics process.
Line 1 in inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 seems to be damaged.
04:18:59 (8544): [CRITICAL]: ERROR: MAIN() returned with error '4'
FPU status flags: PRECISION
04:19:10 (8544): [normal]: done. calling boinc_finish(68).
04:19:10 (8544): called boinc_finish
</stderr_txt>
]]>
Mikie Tim T wrote: Had
)
See:
https://einsteinathome.org/content/cpu-tasks-error-out-after-12-seconds
GPUGrid has reached the end
)
GPUGrid has reached the end of its current research project, so some of us are back - and hammering the upload server again...