VcfInput {Rsamtools}R Documentation

Operations on ‘VCF’ or ‘BCF’ (variant call) files.

Description

Import, coerce, or index variant call files in text or binary format.

Usage


scanBcfHeader(file, ...)
## S4 method for signature 'character'
scanBcfHeader(file, ...)

scanBcf(file, ...)
## S4 method for signature 'character'
scanBcf(file, index = file, ..., param=ScanBcfParam())

asBcf(file, dictionary, destination, ...,
      overwrite=FALSE, indexDestination=TRUE)
## S4 method for signature 'character'
asBcf(file, dictionary, destination, ...,
      overwrite=FALSE, indexDestination=TRUE)

indexBcf(file, ...)
## S4 method for signature 'character'
indexBcf(file, ...)

scanVcfHeader(file, ...)
## S4 method for signature 'character'
scanVcfHeader(file, ...)

scanVcf(file, ..., param)
## S4 method for signature 'character,ANY'
scanVcf(file, ..., param)
## S4 method for signature 'character,missing'
scanVcf(file, ..., param)
## S4 method for signature 'connection,missing'
scanVcf(file, ..., param)

unpackVcf(x, hdr, ..., info=TRUE, geno=TRUE)
## S4 method for signature 'list,missing'
unpackVcf(x, hdr, ..., info=TRUE, geno=TRUE)
## S4 method for signature 'list,character'
unpackVcf(x, hdr, ..., info=TRUE, geno=TRUE)
## S4 method for signature 'list,TabixFile'
unpackVcf(x, hdr, ..., info=TRUE, geno=TRUE)

Arguments

file

For scanBcf and scanBcfHeader, the character() file name of the ‘VCF’ or ‘BCF’ file to be processed, or an instance of class BcfFile. For scanVcf and scanVcfHeader, the character() file name, TabixFile, or class connection ( file() or bgzip()) of the ‘VCF’ file to be processed.

index

The character() file name(s) of the ‘BCF’ index to be processed.

dictionary

a character vector of the unique “CHROM” names in the VCF file.

destination

The character(1) file name of the location where the BCF output file will be created. For asBcf this is without the “.bcf” file suffix.

param

A instance of ScanBcfParam or ScanVcfParam influencing which records are parsed and the ‘INFO’ and ‘GENO’ information returned.

...

Additional arguments, e.g., for scanBcfHeader,character-method, mode of BcfFile.

overwrite

A logical(1) indicating whether the destination can be over-written if it already exists.

indexDestination

A logical(1) indicating whether the created destination file should also be indexed.

x

A list() resulting from scanVcf.

hdr

A character(1) or TabixFile instance from which scanBamHeader can extract information on the structure of INFO and FORMAT specifications.

info, geno

For non-“missing” methods of unpackVcf, a logical(1) indicating whether the ‘INFO’ or ‘GENO’ fields of x should be expanded. If TRUE, then scanVcfHeader(hdr) is consulted for the description of INFO and / or FORMAT fields.

For the “missing” method of unpackVcf, a logical(1) (in which case the corresponding field is not unpacked, regardless of value) or DataFrame or data.frame with row names corresponding to field elements, and with columns Number and Type as defined in the VCF specification at the URL below. Usually, these are obtained from scanVcfHeader on the same file as used to parse the data passed as argument x.

Details

Most users will use the vcf* functions; bcf* are restricted to the GENO fields supported by ‘bcftools’ (see documentation at the url below). The argument param allows portions of the file to be input, but requires that the file be BCF or bgzip'd and indexed as a TabixFile.

scanVcf with param="missing" and file="character" or file="connection" scan the entire file. With file="connection", an argument n indicates the number of lines of the VCF file to input; a connection open at the beginning of the call is open and incremented by n lines at the end of the call, providing a convenient way to stream through large VCF files.

The INFO field of the scanned VCF file is returned as a single ‘packed’ vector, as in the VCF file. The GENO field is returned as a list of matricies, each matrix corresponds to a field as defined in the FORMAT field of the VCF header. Each matrix has as many rows as scanned in the VCF file, and as many columns as there are samples. As with the INFO field, the elements of the matrix are ‘packed’. The reason that INFO and GENO are returned packed is to facilitate manipulation, e.g., selecting particular rows or samples in a consistent manner across elements.

unpackVcf processes the INFO and / or GENO fields, typically using the information encoded in the header and extracted by consulting scanVcfHeader. When the INFO or FORMAT specification includes a field Number. When this is an integer value, the corresponding INFO or GENO is unpacked as a matrix or array. For fields with variable numbers of elements (‘A’, ‘G’, ‘.’), the unpacked data is a list of vectors (for INFO) or list of list of vectors (for GENO), with the outer list corresponding to rows in the scanned VCF, the inner list of GENO corresponding to samples, and the inner vector corresponding to sub-elements of the element.

Value

scanVcfHeader / scanBcfHeader returns a list, with one element for each file named in file. Each element of the list is itself a list containing three element. The reference element is a character() vector with names of reference sequences. The sample element is a character() vector of names of samples. The header element is a character() vector of the header lines (preceeded by “##”) present in the VCF file.

scanVcf / scanBcf returns a list, with one element per file. Each list has 9 elements, corresponding to the columns of the VCF specification: CHROM, POS, ID, REF, ALTQUAL, FILTER, INFO, FORMAT, GENO.

The GENO element is itself a list, with elements corresponding to those defined in the VCF file header. For scanVcf, elements of GENO are returned as a matrix of records x samples; if the description of the element in the file header indicated multiplicity other than 1 (e.g., variable number for “A”, “G”, or “.”), then each entry in the matrix is a character string with sub-entries comma-delimited.

asBcf creates a binary BCF file from a text VCF file.

indexBcf creates an index into the BCF file.

unpackVcf returns a list of the same form as scanVcf, but with INFO and / or GENO elements unpacked to matrix or list elements as appropriate.

Author(s)

Martin Morgan <mtmorgan@fhcrc.org>.

References

http://vcftools.sourceforge.net/specs.html outlines the VCF specification.

http://samtools.sourceforge.net/mpileup.shtml contains information on the portion of the specification implemented by bcftools.

http://samtools.sourceforge.net/ provides information on samtools.

See Also

BcfFile, TabixFile

Examples

fl <- system.file("extdata", "ex1.bcf", package="Rsamtools")
scanBcfHeader(fl)
bcf <- scanBcf(fl)
## value: list-of-lists
str(bcf[1:8])
names(bcf[["GENO"]])
str(head(bcf[["GENO"]][["PL"]]))
example(BcfFile)

[Package Rsamtools version 1.6.3 Index]