Quantcast
Channel: Ask the FireCloud Team — GATK-Forum
Viewing all articles
Browse latest Browse all 1147

google JES/backend running but not running? is it running or not?

$
0
0

I ran a submission on FireCloud. Submission ID 61517c58-7cb4-4e53-a782-9b31b3359136

In that submission I have a task that seems to be running but not running.

More specifically I refer to task polysolverWorkflow.polysolverMut with operations ID operations/EP2ty8mbKxi7sfqg_s61xpABIPCeybKCFioPcHJvZHVjdGlvblF1ZXVl
(see attached screenshot).

The impression I get from the LOG files in the bucket suggest that the task is NOT running.
I say that I get the impression that the task is NOT running because the logs seemingly
ALL stop Jan 20 around 03:52 or 03:53 or so. ( I think the time is UTC time). Today is Jan 23
so the logs seemingly haven't grown since that time.

See terminal output looking at the tails of the logs.

wm8b1-75c:~ esalinas$ for FILE in `gsutil ls gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut*` ; do echo -ne "\n\nNow looking at file $FILE \n\n" ; gsutil cat $FILE | tail ; echo -ne "\n\n" ;  done ;


Now looking at file gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut-stderr.log 

[Fri Jan 20 03:52:07 UTC 2017] picard.sam.AddOrReplaceReadGroups done. Elapsed time: 2.02 minutes.
Runtime.totalMemory()=58851328
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp
[Fri Jan 20 03:52:07 UTC 2017] picard.sam.AddOrReplaceReadGroups INPUT=/cromwell_root/hla_mut_out/nv.complete.chr6region.tumor.R0k6.csorted.nodup.bam OUTPUT=/cromwell_root/hla_mut_out/nv.complete.chr6region.tumor.R0k6.csorted.nodup.RG.bam RGID=foo RGLB=foo RGPL=foo RGPU=foo RGSM=foo    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Fri Jan 20 03:52:07 UTC 2017] Executing as root@5fb29d007e4d on Linux 3.16.0-0.bpo.4-amd64 amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14; Picard version: 1.731(c12a4a98bc3c4f8333565557910d96b939985895_1402405046) JdkDeflater
INFO    2017-01-20 03:52:08 AddOrReplaceReadGroups  Created read group ID=foo PL=foo LB=foo SM=foo

INFO    2017-01-20 03:52:27 AddOrReplaceReadGroups  Processed     1,000,000 records.  Elapsed time: 00:00:18s.  Time for last 1,000,000:   18s.  Last read position: hla_a_03_13:3,065
INFO    2017-01-20 03:52:44 AddOrReplaceReadGroups  Processed     2,000,000 records.  Elapsed time: 00:00:35s.  Time for last 1,000,000:   16s.  Last read position: hla_a_11_01_44:2,019
INFO    2017-01-20 03:53:01 AddOrReplaceReadGroups  Processed     3,000,000 records.  Elapsed time: 00:00:53s.  Time for last 1,000,000:   17s.  Last read position: hla_a_68_14:1,705




Now looking at file gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut-stdout.log 

 99   1   0   0   0   0|   0     0 |   0     0 |   0     0 | 280   234 | 328M 14.1M 3261M  108M|20-01 03:53:03
 99   1   0   0   0   0|   0     0 |   0     0 |   0     0 | 281   232 | 329M 14.1M 3266M  103M|20-01 03:53:04
 99   1   0   0   0   0|   0     0 |   0     0 |   0     0 | 279   243 | 329M 13.8M 3247M  122M|20-01 03:53:05
 99   1   0   0   0   0|   0     0 |   0     0 |   0     0 | 279   237 | 329M 13.8M 3253M  116M|20-01 03:53:06
 98   2   0   0   0   0|   0    28k|   0     0 |   0     0 | 287   261 | 329M 13.8M 3258M  111M|20-01 03:53:07
 98   2   0   0   0   0|   0     0 |   0     0 |   0     0 | 278   251 | 329M 13.8M 3264M  105M|20-01 03:53:08
 99   1   0   0   0   0|   0     0 |   0     0 |   0     0 | 281   249 | 329M 12.2M 3246M  124M|20-01 03:53:09
100   0   0   0   0   0|   0  8192B|   0     0 |   0     0 | 279   238 | 329M 12.2M 3252M  118M|20-01 03:53:10
 94   6   0   0   0   0| 392k    0 |   0     0 |   0     0 |2728    33k| 333M 12.2M 3255M  111M|20-01 03:53:11
 79  21   0   0   0   0| 292k   48k|   0     0 |   0     0 |6862    81k| 334M 12.2M 3258M  108M|20-01 03:53:12




Now looking at file gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut.log 

2017/01/20 03:33:11 I: Copying /var/log/google-genomics/*.log to gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/
2017/01/20 03:33:11 I: Running command: sudo gsutil -h Content-type:text/plain -q -m cp /var/log/google-genomics/*.log gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/
2017/01/20 03:38:11 I: Copying /var/log/google-genomics/*.log to gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/
2017/01/20 03:38:11 I: Running command: sudo gsutil -h Content-type:text/plain -q -m cp /var/log/google-genomics/*.log gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/
2017/01/20 03:43:12 I: Copying /var/log/google-genomics/*.log to gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/
2017/01/20 03:43:12 I: Running command: sudo gsutil -h Content-type:text/plain -q -m cp /var/log/google-genomics/*.log gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/
2017/01/20 03:48:12 I: Copying /var/log/google-genomics/*.log to gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/
2017/01/20 03:48:12 I: Running command: sudo gsutil -h Content-type:text/plain -q -m cp /var/log/google-genomics/*.log gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/
2017/01/20 03:53:12 I: Copying /var/log/google-genomics/*.log to gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/
2017/01/20 03:53:12 I: Running command: sudo gsutil -h Content-type:text/plain -q -m cp /var/log/google-genomics/*.log gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/

Even the JES log stops around 3:53 Jan 20.

Moreover, the result of stat also points to a last update time around the same time on Jan 20

wm8b1-75c:~ esalinas$ for FILE in `gsutil ls gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut*` ; do echo -ne "\n\nNow looking at file $FILE \n\n" ; gsutil stat  $FILE  ;  done ;


Now looking at file gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut-stderr.log 

gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut-stderr.log:
    Creation time:          Fri, 20 Jan 2017 03:53:13 GMT
    Update time:            Fri, 20 Jan 2017 03:53:13 GMT
    Storage class:          STANDARD
    Content-Length:         961010
    Content-Type:           text/plain
    Hash (crc32c):          Ey+yog==
    Hash (md5):             f3E5iQSgyfn4Ewnj1NyIWg==
    ETag:                   CMj30rbpz9ECEAE=
    Generation:             1484884393901000
    Metageneration:         1


Now looking at file gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut-stdout.log 

gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut-stdout.log:
    Creation time:          Fri, 20 Jan 2017 03:53:13 GMT
    Update time:            Fri, 20 Jan 2017 03:53:13 GMT
    Storage class:          STANDARD
    Content-Length:         1448694
    Content-Type:           text/plain
    Hash (crc32c):          qEs6PQ==
    Hash (md5):             TCUkbrI5b33dFC2XLlsWgg==
    ETag:                   CODm1bbpz9ECEAE=
    Generation:             1484884393948000
    Metageneration:         1


Now looking at file gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut.log 

gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut.log:
    Creation time:          Fri, 20 Jan 2017 03:53:13 GMT
    Update time:            Fri, 20 Jan 2017 03:53:13 GMT
    Storage class:          STANDARD
    Content-Length:         30623
    Content-Type:           text/plain
    Hash (crc32c):          tg86HA==
    Hash (md5):             Udl41HCeDkhO45vywwHZ0Q==
    ETag:                   CJDszrbpz9ECEAE=
    Generation:             1484884393834000
    Metageneration:         1
wm8b1-75c:~ esalinas$ 

Seemingly inconsistent with this however is inspection of the output from gcloud alpha genomics operations describe :

wm8b1-75c:~ esalinas$ gsutil cat gs://fc-808c7bf6-8570-4c68-aa90-10265c3007b5/61517c58-7cb4-4e53-a782-9b31b3359136/polysolverWorkflow/83ade842-0777-47bd-ac64-7b37dc252ecb/call-polysolverMut/polysolverMut.log 2>/dev/null |head 2>/dev/null
2017/01/20 00:08:12 I: Switching to status: pulling-image
2017/01/20 00:08:12 I: Calling SetOperationStatus(pulling-image): &{  EP2ty8mbKxi7sfqg_s61xpABIPCeybKCFioPcHJvZHVjdGlvblF1ZXVl [0xc4200511d0 0xc4200513b0] 14256664291378898555 [] []}
2017/01/20 00:08:12 I: SetOperationStatus(pulling-image) succeeded
2017/01/20 00:08:12 I: Writing new Docker configuration file
2017/01/20 00:08:12 I: Pulling image "eddiebroad/polysolver:v4"
2017/01/20 00:09:50 I: Pulled image "eddiebroad/polysolver:v4" successfully.
2017/01/20 00:09:50 I: Switching to status: localizing-files
2017/01/20 00:09:50 I: Calling SetOperationStatus(localizing-files): &{  EP2ty8mbKxi7sfqg_s61xpABIPCeybKCFioPcHJvZHVjdGlvblF1ZXVl [0xc4200511d0 0xc4200513b0 0xc420051770] 14256664291378898555 [] []}
2017/01/20 00:09:51 I: SetOperationStatus(localizing-files) succeeded
2017/01/20 00:09:51 I: Docker file /cromwell_root/5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/ACC/DNA/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-OR-A5JI-10A-01D-A29L-10.bam maps to host location /mnt/local-disk/5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/ACC/DNA/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-OR-A5JI-10A-01D-A29L-10.bam.
wm8b1-75c:~ esalinas$ gcloud alpha genomics operations describe  EP2ty8mbKxi7sfqg_s61xpABIPCeybKCFioPcHJvZHVjdGlvblF1ZXVl |head
done: false
metadata:
  '@type': type.googleapis.com/google.genomics.v1.OperationMetadata
  clientId: ''
  createTime: '2017-01-20T00:07:13Z'
  events:
  - description: start
    startTime: '2017-01-20T00:08:10.837176413Z'
  - description: pulling-image
    startTime: '2017-01-20T00:08:12.745027568Z'
wm8b1-75c:~ esalinas$ 

The gcloud alpha genomics operations describe command says "done: false" which would seem to suggest that the task is still running but that is inconsistent and contradictory with the log timestamps.

Is this behavior expected? Based on my understanding the answer is no.
Is this normal? Likewise I believe not.

Has anyone else seen this behavior I wonder?

Is the task running or not?

Given that the gcloud says "done: false" and that the JES log stopped days ago, does that point to a problem in google JES?

I wonder if anyone has seen anything like this and knows what might be (or not be) going on?

-eddie


Viewing all articles
Browse latest Browse all 1147

Trending Articles