Skip to content

Commit

Permalink
Update pangenome-aware DeepVariant case-studies
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 702111739
  • Loading branch information
kishwarshafin authored and pichuan committed Dec 3, 2024
1 parent 379f914 commit 0fc7041
Showing 1 changed file with 40 additions and 53 deletions.
93 changes: 40 additions & 53 deletions docs/metrics-deeptrio.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,22 @@

## WGS (Illumina)

## Setup

The runtime and accuracy reported in this page are generated using
`n2-standard-96` GCP instances which has the following configuration:

```bash
GCP instance type: n2-standard-96
CPUs: 96-core (vCPU)
Memory: 384GiB
GPUs: 0
```

### Runtime

Runtime is on HG002/HG003/HG004 (all chromosomes).
Reported runtime is an average of 5 runs.

Stage | Wall time (minutes)
-------------------------------- | -----------------
make_examples | 172m53.87s
call_variants: HG002 | 269m26.55s
call_variants: HG003 | 268m2.29s
call_variants: HG004 | 270m22.72s
postprocess_variants (parallel) | 34m12.36s; 35m4.75s; 35m8.14s
vcf_stats_report(optional):HG002 | 6m36.58s
vcf_stats_report(optional):HG003 | 6m39.92s
vcf_stats_report(optional):HG003 | 6m40.64s
total | 1028m3.08s (17h08m3.08s)
make_examples | 381m27.76s
call_variants: HG002 | 376m44.92s
call_variants: HG003 | 379m55.40s
call_variants: HG004 | 380m27.95s
postprocess_variants (parallel) | 45m24.88s; 47m0.02s; 47m46.29s
vcf_stats_report(optional):HG002 | 9m20.03s
vcf_stats_report(optional):HG003 | 9m29.88s
vcf_stats_report(optional):HG003 | 9m29.88s
total | 1576m56.29s (26h16m56.29s)

### Accuracy

Expand Down Expand Up @@ -59,13 +47,13 @@ truth), which was held out while training.
| SNP | 71445 | 214 | 48 | 0.997014 | 0.999329 | 0.99817 |

* See VCF stats report (for all chromosomes)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WGS/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WGS/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WGS/HG004.output.visual_report.html)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WGS/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WGS/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WGS/HG004.output.visual_report.html)

## PacBio (HiFi)

Read haplotagging in DeepTrio PacBio is on by default. You no longer
In v1.7.0, we introduced read haplotagging in DeepTrio PacBio. You no longer
need to run DeepVariant->WhatsHap->DeepTrio, and can just run DeepTrio once.

### Runtime
Expand All @@ -75,20 +63,20 @@ Reported runtime is an average of 5 runs.

Stage | Wall time (minutes)
-------------------------------- | -------------------
make_examples | 16m48.88s+288m15.08s
call_variants: HG002 | 279m5.76s
call_variants: HG003 | 274m47.90s
call_variants: HG004 | 283m37.89s
postprocess_variants (parallel) | 44m12.28s; 51m39.02s; 51m52.66s
vcf_stats_report(optional):HG002 | 6m49.94s
vcf_stats_report(optional):HG003 | 6m53.24s
vcf_stats_report(optional):HG003 | 7m19.57s
total | 1206m35.85s (20h6m35.85s)
make_examples | 50m35.96s+621m56.74s
call_variants: HG002 | 364m39.93s
call_variants: HG003 | 368m0.84s
call_variants: HG004 | 372m44.77s
postprocess_variants (parallel) | 58m52.92s; 66m36.57s; 67m35.91s
vcf_stats_report(optional):HG002 | 9m33.72s
vcf_stats_report(optional):HG003 | 9m48.13s
vcf_stats_report(optional):HG003 | 10m1.22s
total | 1858m53.78s (30h58m53.78s)

* See VCF stats report (for all chromosomes)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/PACBIO/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/PACBIO/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/PACBIO/HG004.output.visual_report.html)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/PACBIO/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/PACBIO/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/PACBIO/HG004.output.visual_report.html)

### Accuracy

Expand All @@ -108,7 +96,6 @@ truth), which was held out while training.
| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
| INDEL | 10577 | 51 | 77 | 0.995201 | 0.993089 | 0.994144 |
| SNP | 70143 | 23 | 35 | 0.999672 | 0.999502 | 0.999587 |

#### HG004:

| Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
Expand All @@ -125,15 +112,15 @@ Reported runtime is an average of 5 runs.

Stage | Wall time (minutes)
-------------------------------- | --------------
make_examples | 7m11.47s
call_variants: HG002 | 3m49.25s
call_variants: HG003 | 3m53.32s
call_variants: HG004 | 3m52.68s
postprocess_variants (parallel) | 0m40.52s; 0m42.09s; 0m42.30s
vcf_stats_report(optional):HG002 | 0m5.65s
vcf_stats_report(optional):HG003 | 0m5.69s
vcf_stats_report(optional):HG003 | 0m7.15s
total | 20m6.26s
make_examples | 15m6.77s
call_variants: HG002 | 5m16.13s
call_variants: HG003 | 5m18.83s
call_variants: HG004 | 5m19.09s
postprocess_variants (parallel) | 0m51.70s; 0m52.27s; 0m53.73s
vcf_stats_report(optional):HG002 | 0m7.84s
vcf_stats_report(optional):HG003 | 0m8.01s
vcf_stats_report(optional):HG003 | 0m10.00s
total | 32m20.47s

### Accuracy

Expand Down Expand Up @@ -163,14 +150,14 @@ truth), which was held out while training.
| SNP | 676 | 3 | 0 | 0.995582 | 1.0 | 0.997786 |

* See VCF stats report (for all chromosomes)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WES/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WES/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WES/HG004.output.visual_report.html)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WES/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WES/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WES/HG004.output.visual_report.html)

## How to reproduce the metrics on this page

For simplicity and consistency, we report runtime with a
[CPU instance with 96 CPUs](deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform)
[CPU instance with 64 CPUs](deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform)
For bigger datasets (WGS and PACBIO), we used bigger disk size (900G).
This is NOT the fastest or cheapest configuration.

Expand All @@ -179,7 +166,7 @@ Use `gcloud compute ssh` to log in to the newly created instance.
Download and run any of the following case study scripts:

```
curl -O https://raw.githubusercontent.com/google/deepvariant/r1.8/scripts/inference_deeptrio.sh
curl -O https://raw.githubusercontent.com/google/deepvariant/r1.7/scripts/inference_deeptrio.sh
# WGS
bash inference_deeptrio.sh --model_preset WGS
Expand All @@ -197,4 +184,4 @@ DeepTrio. The runtime numbers reported above are the average of 5 runs each.
The accuracy metrics come from the hap.py summary.csv output file.
The runs are deterministic so all 5 runs produced the same output.

[CPU instance with 96 CPUs]: deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform
[CPU instance with 64 CPUs]: deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform

0 comments on commit 0fc7041

Please sign in to comment.