issue GW-opencl-ati at Windows x86_64: remain time (estimed) increases abnormally

OpenStreetMapper

Joined: 3 Jan 20

Posts: 4

Credit: 2387475

RAC: 0

Thank you for the suggestion.

11 Feb 2020 17:59:16 UTC

Message 175609 in response to message 175594

(moderation:

)

JStateson wrote:

Suggest you set resources to 0.0 for Einstein. if too many file error out you will get banned for 24 hours.

Been there, done that.

Thank you for the suggestion. I have unticked this particular application within Einstein. I've since returned valid gamma ray GPU WU results. If developers manage to track down and resolve these issues in the coming weeks, I'll reenable it again later on.

OpenStreetMapper

Joined: 3 Jan 20

Posts: 4

Credit: 2387475

RAC: 0

Holmis wrote: Gravitational

11 Feb 2020 17:52:23 UTC

Message 175611 in response to message 175592

(moderation:

)

Holmis wrote:

Gravitational Wave work on GPUs need a full CPU core to support it for efficient running, make sure your settings doesn't overload your CPU with work. Try reducing "Use at most: XX% of the processors" to free up cores/threads for GPU support.

I am not crunching anything on the CPU - the machine is about 97% idle without BOINC. From a quick glance, O2MDF uses between 10-30% of a single core - it is hardly CPU bound, so the GPU kernel must be in error.

I've seen a lot of recent messages to this effect when looking around in the forum and just wanted to confirm that it is hitting many platforms and devices.

Holmis wrote:

It's quite normal to have pending tasks.

The Gravitational Wave search uses something called Locality scheduling to try and reduce the amount of data one needs to download by trying to send tasks that can make use of already downloaded data files or with a minimum of extra downloads. This sometimes leads to a situation where only a few participants get tasks for a given data file set and thus increasing the time before the second task gets sent out and processed by a wingman. At the moment of writing this I have 100 tasks pending validation. Just be patient and the second task will get sent and processed and the WU validated.

Those are some great insights! I can wait some more and proceed to crunch other Einstein applications until the pending ones get validated eventually.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

e2bc wrote:I am not crunching

11 Feb 2020 22:49:15 UTC

Message 175617 in response to message 175611

(moderation:

)

e2bc wrote:

I am not crunching anything on the CPU - the machine is about 97% idle without BOINC. From a quick glance, O2MDF uses between 10-30% of a single core - it is hardly CPU bound, so the GPU kernel must be in error.

I've seen a lot of recent messages to this effect when looking around in the forum and just wanted to confirm that it is hitting many platforms and devices.

As I understand it it's not a question of the CPU being loaded with work all the time while supporting a GPU but rather that it has to be available to support the GPU when it's needed to get the best performance. If the CPU is tied up with other work then the, probably, short delay will have a substantial impact on GPU performance. As your not crunching on the CPU this will not apply in your situation.

As to what is causing you to be unable to run Gravitational Wave tasks on the GPU I've run out of ideas, you will have to provide more info about your setup and conditions if you want to get advice on how to run these tasks.

Keith Myers

Joined: 11 Feb 11

Posts: 5055

Credit: 19183503388

RAC: 5492848

Quote:I am not crunching

12 Feb 2020 4:12:06 UTC

Message 175619

(moderation:

)

Quote:

I am not crunching anything on the CPU - the machine is about 97% idle without BOINC. From a quick glance, O2MDF uses between 10-30% of a single core - it is hardly CPU bound, so the GPU kernel must be in error.

You are obviously looking at the wrong data. If in Windows, you must change the displayed gpu/cpu utilization graphs from default.

The current O2MDFV2 gpu tasks require between 100-110% of a cpu core to support the gpu task. So each gpu task requires at least 2 cpu threads to properly service that task and let it operate at full speed.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119382359174

RAC: 25888930

Keith Myers wrote:You are

12 Feb 2020 9:33:48 UTC

Message 175622 in response to message 175619

(moderation:

)

Keith Myers wrote:

You are obviously looking at the wrong data. If in Windows, you must change the displayed gpu/cpu utilization graphs from default.

The current O2MDFV2 gpu tasks require between 100-110% of a cpu core to support the gpu task. So each gpu task requires at least 2 cpu threads to properly service that task and let it operate at full speed.

E2BC is new here so there's a lot of stuff to experience and learn that we tend to take for granted :-).

First of all, the user's computers are hidden so we're all stumbling around in the dark to some extent. All we know is that the machine is "Apple", the error is "EXIT_TIME_LIMIT_EXCEEDED" and the GPU is "Radeon Pro 560X" - whatever that is. We have no ability to look at hardware details and lists of tasks returned to the project without the user giving us a link to the host concerned. With what we know, there is no point talking about Windows - since it's probably MAC OS - or the CPU support requirements for nvidia GPUs - since it's not nvidia :-). Yes, CPU support is important since a lot of the calculations are not yet done on the GPU, but it's quite likely that well less than a full core is required for CPU support with this type of GPU. The more important factor is the immediacy of the support so if any other normal work could use a lot of CPU, perhaps that might be contributing to a slow crunch time.

I checked here to find out what are the basic specs for the GPU. Turns out it is pretty much a cut down version of an RX 560 by the look of things - in other words, it won't be very fast, probably around a third to a half the performance of an RX 570, which many of us are familiar with as being a proficient mid-range type of device.

My guess is that the server thinks that this type of GPU is a lot better than it really is so that it is setting an unrealistically low limit for when to pull the plug on the calculations. This is coupled with the fact that we are now doing tasks based on searching for emissions from the VelaJr pulsar. On past performance, crunch times for tasks of this type take at least double the time that we see for other pulsars and Bernd has commented about this previously - along the lines that this slow performance was unexpected - so probably not allowed for when setting time limits.

We have all observed that actual crunch times are a lot longer than what the estimates would suggest and the client has to significantly increase the duration correction factor (DCF) to eventually end up with more accurate estimates. All these things add up to making it more likely to have this error when crunching VelaJr stuff.

Since the GW search has the highest scientific interest, it's unfortunate that the app is not more efficient than it currently is. It will hopefully be improved over time. Perhaps it's just not possible to get more of the work done by the GPU so perhaps it will never be able to reach the efficiency levels of the FGRPB1G app. Hopefully, something will eventually be done to get better time estimates for the work content of GW tasks so that tasks have a more forgiving time limit for crunching.

Cheers,
Gary.

OpenStreetMapper

Joined: 3 Jan 20

Posts: 4

Credit: 2387475

RAC: 0

Okay guys, good news: today

14 Feb 2020 16:23:58 UTC

Message 175646 in response to message 175622

(moderation:

)

Okay guys, good news: today the machine could successfully process a new GW GPU task!

So maybe the erroneous WUs were from an earlier generated batch in my buffer. Crunching on!

issue GW-opencl-ati at Windows x86_64: remain time (estimed) increases abnormally

Forums › Problems and Bug Reports

Thank you for the suggestion.

Holmis wrote: Gravitational

e2bc wrote:I am not crunching

Quote:I am not crunching

Keith Myers wrote:You are

Okay guys, good news: today

Comment viewing options

Forums › Problems and Bug Reports