Extras

Script to batch process a CSV File

Example Scripts to batch reduce HLA typings from a CSV File

pyard-reduce-csv command can be used with a config file(that describes ways to reduce the file) to take a CSV file with HLA typing data and reduce certain columns and produce a new CSV or an Excel file.

Steps on batch processing a CSV file.

Install py-ard
Specify the configuration on how the file should be processed in a JSON .json config file.
Run pyard-reduce-csv -c <config-file> to produce a processed file based on the configuration in the config file.

To help with creating configuration file, you can use -g or --generate-sample option to pyard-reduce-csv and generate a sample configuration and a sample CSV file.

These files should be used as a template for your own data.

Once the configuration file is created, use -c option to specify the configuration file to be used for batch processing.

In the following example, we generate a sample configuration and CSV file.

$ pyard-reduce-csv --generate-sample
Created sample_reduce_conf.json
Created sample.csv

We specify the config file with -c and a -q to suppress verbose log messages.

$ pyard-reduce-csv -c sample_reduce_conf.json -q
Using config file: reduce_conf.json
Failed reducing 'C*02:85:02' in column r_c_typ2
Failed reducing 'DRB1*14:167:01' in column r_drb1_typ2
...

Summary
-------
16 alleles failed to reduce.
| Column  Name    |      Allele      |      Did you mean ?
| --------------- | ---------------- | -------------------------
| r_c_typ2        | C*02:85:02       | NA
| r_drb1_typ2     | DRB1*14:167:01   | NA
...

Saved result to file:clean_sample.csv.gz

See Example JSON config file.

Configuration Options

The configuration file provides the following options to modify how the reduction happens.

Configuration Option	Type	Description
`in_csv_filename`	str	Input CSV filename
`out_csv_filename`	str	Output CSV filename
`columns_from_csv`	list	CSV Columns to read
`locus_column_mapping`	dict	CSV Columns to reduce
`redux_type`	str	Reduction Type
`redux_cache_size`	int	Cache size
`reduce_serology`	bool	Reduce Serology ?
`reduce_v2`	bool	Reduce V2 formatted alleles ?
`convert_v2_to_v3`	bool	Convert V2 format to V3 ?
`reduce_2field`	bool	Reduced alleles that are 2 field ?
`reduce_3field`	bool	Reduced alleles that are 3 field ?
`reduce_P`	bool	Reduced alleles that have P suffix ?
`reduce_XX`	bool	Reduced XX Alleles ?
`reduce_MAC`	bool	Reduced MAC Alleles ?
`map_drb345_to_drbx`	bool	Map DRB3,4,5 to DRBX using WMDA Rules ?
`locus_in_allele_name`	bool	Is Locus name specified for each allele ?
`keep_locus_in_allele_name`	bool	Output Locus name for each allele ?
`new_column_for_redux`	bool	Create a new column or replace the original ?
`reduced_column_prefix`	str	Prefix to use for reduced column
`generate_glstring`	bool	Generate a GL String column for each subject ?
`output_file_format`	str	Format of the output file
`apply_compression`	str	Compression format for the output file
`verbose_log`	bool	Output verbose log to the screen ?

Input CSV filename

in_csv_filename Directory path and file name of the Input CSV file

Output CSV filename

out_csv_filename Directory path and file name of the Reduced Output CSV file

CSV Columns to read

columns_from_csv The column names to read from CSV file

 [
  "nmdp_id",
  "r_a_typ1",
  "r_a_typ2",
  "r_b_typ1",
  "r_b_typ2",
  "r_c_typ1",
  "r_c_typ2",
  "d_a_typ1",
  "d_a_typ2",
  "d_b_typ1",
  "d_b_typ2",
  "d_c_typ1",
  "d_c_typ2"
]

CSV Columns to reduce

locus_column_mapping Mapping of subject types (eg. Recipient, Donor) to their loci and the corresponding columns with typings for those loci. The column names corresponding to the loci will be reduced and must appear in the list of columns_from_csv.

  "locus_column_mapping": {
    "recipient": {
        "A": [
            "r_a_typ1",
            "r_a_typ2"
        ],
        "B": [
            "r_b_typ1",
            "r_b_typ2"
        ],
        "C": [
            "r_c_typ1",
            "r_c_typ2"
        ]
    },
    "donor": {
        "A": [
            "d_a_typ1",
            "d_a_typ2"
        ],
        "B": [
            "d_b_typ1",
            "d_b_typ2"
        ],
        "C": [
            "d_c_typ1",
            "d_c_typ2"
        ]
    }
}

GL String Columns

Instead of providing single locus alleles per column with locus_column_mapping, a GL String describing the whole genotype can be provided per column. Use glstring_columns to provide a list of GL String columns to reduce.

  "glstring_columns": [
    "donor_gl",
    "recip_gl"
  ],

Depending upon the data, only one of locus_column_mapping or glstring_columns needs to be provided.

Redux Options

redux_type Reduction Type

Valid Options are:

Reduction Type	Description
`G`	Reduce to G Group Level
`P`	Reduce to P Group Level
`lg`	Reduce to 2 field ARD level (append `g`)
`lgx`	Reduce to 2 field ARD level
`W`	Reduce/Expand to 3 field WHO nomenclature level
`exon`	Reduce/Expand to exon level
`U2`	Reduce to 2 field unambiguous level

Cache size

When processing a large file, it's helpful to cache results of previous reductions, the default is to cache only 1,000 but this can be increased with the redux_cache_size option.

  "redux_cache_size": 5000,

Kinds of typings to reduce

Pick and choose which of the typings to reduce.

    "reduce_serology": false,
    "reduce_v2": true,
    "convert_v2_to_v3": false,
    "reduce_3field": true,
    "reduce_P": true,
    "reduce_XX": false,
    "reduce_MAC": true,

Valid options: true or false

Map to DRBX

map_drb345_to_drbx Map to DRBX Typings based on DRB3, DRB4 and DRB5 typings using WMDA method.

Valid options: true or false

Locus Name in Allele

locus_in_allele_name Is locus name present in allele ? E.g. A*01:01 vs 01:01

Valid options: true or false

Keep Locus Name in Allele

keep_locus_in_allele_name Should the reduced version have locus name present in allele ? E.g. A*01:01 vs 01:01

Valid options: true or false

Create New Column

new_column_for_redux Add a separate column for processed column or replace the current column. Creates a reduced_ version of the column. Otherwise, the same column is replaced with the reduced version.

Valid options: true, false

Specify the prefix for the new column with reduced_column_prefix.

"reduced_column_prefix": "reduced_",

GL String

Generate a GL String column with reduced typings from each subject.

  "generate_glstring": true,

Valid options: true, false

Output Format

output_file_format Format of the output file

Valid options: csv or xlsx

For Excel output, openpyxl library needs to be installed. Install with:

 pip install openpyxl

Compression Options

apply_compression Compression to use for output file. Applies only to CSV files.

Valid options: 'gzip', 'zip' or null

Verbose log Options

verbose_log Show verbose log ?

Valid options: true or false

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Extras

Script to batch process a CSV File

Configuration Options

Input CSV filename

Output CSV filename

CSV Columns to read

CSV Columns to reduce

GL String Columns

Redux Options

Cache size

Kinds of typings to reduce

Map to DRBX

Locus Name in Allele

Keep Locus Name in Allele

Create New Column

GL String

Output Format

Compression Options

Verbose log Options

Files

README.md

Latest commit

History

README.md

File metadata and controls

Extras

Script to batch process a CSV File

Configuration Options

Input CSV filename

Output CSV filename

CSV Columns to read

CSV Columns to reduce

GL String Columns

Redux Options

Cache size

Kinds of typings to reduce

Map to DRBX

Locus Name in Allele

Keep Locus Name in Allele

Create New Column

GL String

Output Format

Compression Options

Verbose log Options