Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to generate FAI files and TBI files #1665

Open
Lys9527 opened this issue Apr 6, 2023 · 5 comments
Open

Question: How to generate FAI files and TBI files #1665

Lys9527 opened this issue Apr 6, 2023 · 5 comments

Comments

@Lys9527
Copy link

Lys9527 commented Apr 6, 2023

I recently used htsjdk 3.0.5 to complete my own project, I would like to ask how htsjdk needs to be used to generate fai files and tbi files, thank you

@lindenb
Copy link
Contributor

lindenb commented Apr 6, 2023

FAI:


https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/reference/FastaSequenceIndexCreator.java

TBI:

private static int copyTo(final Path invcf,final Path outvcfpath) throws IOException {
	try(VCFReader vcfReader1  = new VCFFileReader(invcf,false)) {
		final VCFHeader header1 = vcfReader1.getHeader();
		final VariantContextWriterBuilder vcb = new VariantContextWriterBuilder();
			final SAMSequenceDictionary dict1 = header1.getSequenceDictionary();
			vcb.setReferenceDictionary(header1.getSequenceDictionary());
        	vcb.setIndexCreator(new TabixIndexCreator(dict1, TabixFormat.VCF));
        	vcb.setOption(Options.INDEX_ON_THE_FLY);
        	vcb.setOutputPath(outvcfpath);
        	vcb.setCreateMD5(false);
        	
        	try(final VariantContextWriter w = vcb. build()) {
	        	w.writeHeader(header1);
	        	try(final CloseableIterator<VariantContext> iter1 = vcfReader1.iterator()) {
	        		while(iter1.hasNext()) {
	        			w.add(iter1.next());
	        			}
	        		}
        		}
		}

	return count;
	}
@Lys9527
Copy link
Author

Lys9527 commented Apr 6, 2023

FAI:


https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/reference/FastaSequenceIndexCreator.java

TBI:

private static int copyTo(final Path invcf,final Path outvcfpath) throws IOException {
	try(VCFReader vcfReader1  = new VCFFileReader(invcf,false)) {
		final VCFHeader header1 = vcfReader1.getHeader();
		final VariantContextWriterBuilder vcb = new VariantContextWriterBuilder();
			final SAMSequenceDictionary dict1 = header1.getSequenceDictionary();
			vcb.setReferenceDictionary(header1.getSequenceDictionary());
        	vcb.setIndexCreator(new TabixIndexCreator(dict1, TabixFormat.VCF));
        	vcb.setOption(Options.INDEX_ON_THE_FLY);
        	vcb.setOutputPath(outvcfpath);
        	vcb.setCreateMD5(false);
        	
        	try(final VariantContextWriter w = vcb. build()) {
	        	w.writeHeader(header1);
	        	try(final CloseableIterator<VariantContext> iter1 = vcfReader1.iterator()) {
	        		while(iter1.hasNext()) {
	        			w.add(iter1.next());
	        			}
	        		}
        		}
		}

	return count;
	}

Thank you very much for your answer, I have understood the generation of fai file, for tai file, do you need to limit the format of the input file here? For example, gff file, sorry I am not learning this major, but I need to use this function in my project, so the problem is a little rude, please forgive me

@lindenb
Copy link
Contributor

lindenb commented Apr 6, 2023

for TBI +BED, you can use

			TabixIndexCreator indexCreator=new TabixIndexCreator(TabixFormat.BED);
			writer = new BlockCompressedOutputStream(this.outputFile);

				long filePosition= writer.getFilePointer();
				while(in.hasNext())
					{
					String line = in.next();
					BedLine bed = this.bedCodec.decode(line);
					if(bed==null) continue;
					writer.write(line.getBytes());
					writer.write('\n');
					indexCreator.addFeature(bed, filePosition);
					filePosition = writer.getFilePointer();
					}
				}
			writer.flush();
			Index index = indexCreator.finalizeIndex(writer.getFilePointer());

it's the same idea for GFF but used a gff3 codec: https://www.javadoc.io/static/com.github.samtools/htsjdk/2.24.1/htsjdk/tribble/gff/Gff3Codec.html

@Lys9527
Copy link
Author

Lys9527 commented Apr 6, 2023

for TBI +BED, you can use

			TabixIndexCreator indexCreator=new TabixIndexCreator(TabixFormat.BED);
			writer = new BlockCompressedOutputStream(this.outputFile);

				long filePosition= writer.getFilePointer();
				while(in.hasNext())
					{
					String line = in.next();
					BedLine bed = this.bedCodec.decode(line);
					if(bed==null) continue;
					writer.write(line.getBytes());
					writer.write('\n');
					indexCreator.addFeature(bed, filePosition);
					filePosition = writer.getFilePointer();
					}
				}
			writer.flush();
			Index index = indexCreator.finalizeIndex(writer.getFilePointer());

it's the same idea for GFF but used a gff3 codec: https://www.javadoc.io/static/com.github.samtools/htsjdk/2.24.1/htsjdk/tribble/gff/Gff3Codec.html

Thank you very much for your detailed answer. If you can give me an example, for example, my file is /path/to/SCHZ.rRNA.gff.gz, how do I need to generate the “/path/to/SCHZ.rRNA.gff.gz.tbi” file?Sorry I'm new to the java building website project, very unproficient, bothering you!

@gokalpcelik
Copy link

Using codecs and IndexFactory should do fine I guess.
Reiterating over the actual file may not be necessary

Here is my code from my repository

final VCFCodec codec = new VCFCodec();

		if (this.VCFFile.getAbsolutePath().endsWith(FileExtensions.VCF)) {
			idxpath = this.VCFFile.getAbsolutePath() + FileExtensions.TRIBBLE_INDEX;

			try {
				final Index idx = IndexFactory.createDynamicIndex(this.VCFFile, codec,
						IndexBalanceApproach.FOR_SEEK_TIME);
				idx.write(new File(idxpath));
			} catch (final Exception e) {
				if (e instanceof UnsortedFileException) {
					OverSeer.log(this.getClass().getSimpleName(),
							"VCF File is not sorted. Sorting function is currently not implemented. Please sort your vcf using a proper tool.",
							OverSeer.ERROR);
				} else {
					e.printStackTrace();
				}

			}

		} else if (this.VCFFile.getAbsolutePath().endsWith(FileExtensions.COMPRESSED_VCF)) {
			idxpath = this.VCFFile.getAbsolutePath() + FileExtensions.TABIX_INDEX;

			try {
				final Index idx = IndexFactory.createIndex(this.VCFFile, codec, IndexType.TABIX);
				idx.write(new File(idxpath));
			} catch (final Exception e) {
				e.printStackTrace();
			}

		}

I bet it will work for GFF with suitable codec within htsjdk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants