xmlEventParse {XML} | R Documentation |
Reads and processes the contents of an XML file or string by invoking user-level functions associated with different components of the XML tree. These include beginning and end of XML elements, comments, CDATA (escaped character data), entities, processing instructions, etc. This allows the caller to create the appropriate data structure from the XML document contents rather than the default tree (see xmlTreeParse). Functions for specific tags/elements can be used in addition to the standard callback names.
xmlEventParse(file, handlers=xmlHandler(), ignoreBlanks, addContext=T, useTagName=F, asText =F, trim=T, useExpat=F, isURL = F)
file |
string identifying the file that is interpreted
using the internal expansion mechanism so it can contain ~
and other environment variables.
As with xmlTreeParse , if useExpat
is false, this can be a URL (http or ftp) or a compressed file,
as well as a regular local file.
|
handlers |
a closure object that contains functions which will be invoked
as the XML components in the document are encountered by the parser.
The standard functions are
startElement() , endElement()
comment() , externalEntity() ,
entityDeclaration() , processingInstruction ,
text() .
|
ignoreBlanks |
logical value indicating whether text elements made up entirely of white space should be included in the resulting `tree'. |
addContext |
logical value indicating whether the callback functions in `handlers' should be invoked with contextual information about the parser and the position in the tree, such as node depth, path indices for the node relative the root, etc. If this is True, each callback function should support .... |
useTagName |
logical value indicating whether
the callback mechanism should look for a function
matching the tag name in the startElement and
endElement events, before calling the default handler
functions. This allows the caller to handle different
element types for a particular DTD with their own functions directly, rather
than performing a second dispatch in startElement() .
|
asText |
logical value indicating that the first argument, `file', should be treated as the XML text to parse, not the name of a file. This allows the contents of documents to be retrieved from different sources (e.g. HTTP servers, XML-RPC, etc.) and still use this parser. |
trim |
whether to strip white space from the beginning and end of text strings. |
useExpat |
a logical value indicating whether to use the expat SAX parser, or to default to the libxml. If this is TRUE, the library must have been compiled with support for expat. See supportsExpat. |
isURL |
indicates whether the file argument refers to a URL
(accessible via ftp or http) or a regular file on the system.
If asText is TRUE, this should not be specified.
|
This is implemented via the Expat XML parser by Jim Clark (http://www.jclark.com).
The return value is the `handlers' argument. It is assumed that this is a closure and that the callback functions have manipulated variables local to it and that the caller knows how to extract this.
This requires the Expat XML parser to be installed.
Duncan Temple Lang
http://www.w3.org/XML, http://www.jclark.com/xml
fileName <- system.file("data", "mtcars.xml", pkg="XML") # Print the name of each XML tag encountered at the beginning of each # tag. # Uses the libxml SAX parser. xmlEventParse(fileName, list(startElement=function(name, attrs){cat(name,"\n")}), useTagName=F, addContext = F) # Parse the text rather than a file or URL by reading the URL's contents # and making it a single string. Then call xmlEventParse xmlURL <- "http://www.omegahat.org/Scripts/Data/mtcars.xml" xmlText <- paste(scan.url(xmlURL, what="",sep="\n"),"\n",collapse="\n") xmlEventParse(xmlText, asText=T)