VcfInput {Rsamtools} | R Documentation |
Import, coerce, or index variant call files in text or binary format.
scanBcfHeader(file, ...) ## S4 method for signature 'character' scanBcfHeader(file, ...) scanBcf(file, ...) ## S4 method for signature 'character' scanBcf(file, index = file, ..., param=ScanBcfParam()) asBcf(file, dictionary, destination, ..., overwrite=FALSE, indexDestination=TRUE) ## S4 method for signature 'character' asBcf(file, dictionary, destination, ..., overwrite=FALSE, indexDestination=TRUE) indexBcf(file, ...) ## S4 method for signature 'character' indexBcf(file, ...) scanVcfHeader(file, ...) ## S4 method for signature 'character' scanVcfHeader(file, ...) scanVcf(file, ..., param) ## S4 method for signature 'character,ANY' scanVcf(file, ..., param) ## S4 method for signature 'character,missing' scanVcf(file, ..., param) ## S4 method for signature 'connection,missing' scanVcf(file, ..., param) unpackVcf(x, hdr, ..., info=TRUE, geno=TRUE) ## S4 method for signature 'list,missing' unpackVcf(x, hdr, ..., info=TRUE, geno=TRUE) ## S4 method for signature 'list,character' unpackVcf(x, hdr, ..., info=TRUE, geno=TRUE) ## S4 method for signature 'list,TabixFile' unpackVcf(x, hdr, ..., info=TRUE, geno=TRUE)
file |
For |
index |
The character() file name(s) of the ‘BCF’ index to be processed. |
dictionary |
a character vector of the unique “CHROM” names in the VCF file. |
destination |
The character(1) file name of the location where
the BCF output file will be created. For |
param |
A instance of |
... |
Additional arguments, e.g., for
|
overwrite |
A logical(1) indicating whether the destination can be over-written if it already exists. |
indexDestination |
A logical(1) indicating whether the created destination file should also be indexed. |
x |
A list() resulting from |
hdr |
A character(1) or |
info, geno |
For non-“missing” methods of
For the “missing” method of |
Most users will use the vcf*
functions; bcf*
are
restricted to the GENO fields supported by ‘bcftools’ (see
documentation at the url below). The argument param
allows
portions of the file to be input, but requires that the file be BCF or
bgzip'd and indexed as a TabixFile
.
scanVcf
with param="missing"
and file="character"
or file="connection"
scan the entire file. With
file="connection"
, an argument n
indicates the number of
lines of the VCF file to input; a connection open at the beginning of
the call is open and incremented by n
lines at the end of the
call, providing a convenient way to stream through large VCF files.
The INFO field of the scanned VCF file is returned as a single ‘packed’ vector, as in the VCF file. The GENO field is returned as a list of matricies, each matrix corresponds to a field as defined in the FORMAT field of the VCF header. Each matrix has as many rows as scanned in the VCF file, and as many columns as there are samples. As with the INFO field, the elements of the matrix are ‘packed’. The reason that INFO and GENO are returned packed is to facilitate manipulation, e.g., selecting particular rows or samples in a consistent manner across elements.
unpackVcf
processes the INFO and / or GENO fields, typically
using the information encoded in the header and extracted by
consulting scanVcfHeader
. When the INFO or FORMAT
specification includes a field Number. When this is an integer value,
the corresponding INFO or GENO is unpacked as a matrix or array. For
fields with variable numbers of elements (‘A’, ‘G’,
‘.’), the unpacked data is a list of vectors (for INFO) or list
of list of vectors (for GENO), with the outer list corresponding to
rows in the scanned VCF, the inner list of GENO corresponding to
samples, and the inner vector corresponding to sub-elements of the
element.
scanVcfHeader
/ scanBcfHeader
returns a list, with one
element for each file named in file
. Each element of the list
is itself a list containing three element. The reference
element is a character() vector with names of reference sequences.
The sample
element is a character() vector of names of
samples. The header
element is a character() vector of the
header lines (preceeded by “##”) present in the VCF file.
scanVcf
/ scanBcf
returns a list, with one element per
file. Each list has 9 elements, corresponding to the columns of the
VCF specification: CHROM
, POS
, ID
, REF
,
ALT
QUAL
, FILTER
, INFO
, FORMAT
,
GENO
.
The GENO
element is itself a list, with elements corresponding
to those defined in the VCF file header. For scanVcf
, elements
of GENO are returned as a matrix of records x samples; if the
description of the element in the file header indicated multiplicity
other than 1 (e.g., variable number for “A”, “G”, or
“.”), then each entry in the matrix is a character string with
sub-entries comma-delimited.
asBcf
creates a binary BCF file from a text VCF file.
indexBcf
creates an index into the BCF file.
unpackVcf
returns a list of the same form as scanVcf
,
but with INFO and / or GENO elements unpacked to matrix or list
elements as appropriate.
Martin Morgan <mtmorgan@fhcrc.org>.
http://vcftools.sourceforge.net/specs.html outlines the VCF specification.
http://samtools.sourceforge.net/mpileup.shtml contains
information on the portion of the specification implemented by
bcftools
.
http://samtools.sourceforge.net/ provides information on
samtools
.
fl <- system.file("extdata", "ex1.bcf", package="Rsamtools") scanBcfHeader(fl) bcf <- scanBcf(fl) ## value: list-of-lists str(bcf[1:8]) names(bcf[["GENO"]]) str(head(bcf[["GENO"]][["PL"]])) example(BcfFile)