Class StreamParser
- java.lang.Object
-
- org.jsoup.parser.StreamParser
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
public class StreamParser extends java.lang.Object implements java.io.Closeable
A StreamParser provides a progressive parse of its input. As each Element is completed, it is emitted via a Stream or Iterator interface. Elements returned will be complete with all their children, and an (empty) next sibling, if applicable.Elements (or their children) may be removed from the DOM during the parse, for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit into memory, yet still providing a DOM interface to the document and its elements.
Additionally, the parser provides a
selectFirst(String query)
/selectNext(String query)
, which will run the parser until a hit is found, at which point the parse is suspended. It can be resumed via anotherselect()
call, or via thestream()
oriterator()
methods.Once the input has been fully read, the input Reader will be closed. Or, if the whole document does not need to be read, call
stop()
andclose()
.The
document()
method will return the Document being parsed into, which will be only partially complete until the input is fully consumed.A StreamParser can be reused via a new
parse(Reader, String)
, but is not thread-safe for concurrent inputs. New parsers should be used in each thread.If created via
Connection.Response.streamParser()
, or another Reader that is I/O backed, the iterator and stream consumers will throw anUncheckedIOException
if the underlying Reader errors during read.The StreamParser interface is currently in beta and may change in subsequent releases. Feedback on the feature and how you're using it is very welcome via the jsoup discussions.
- Since:
- 1.18.1
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) class
StreamParser.ElementIterator
-
Field Summary
Fields Modifier and Type Field Description private Document
document
private StreamParser.ElementIterator
it
private Parser
parser
private boolean
stopped
private TreeBuilder
treeBuilder
-
Constructor Summary
Constructors Constructor Description StreamParser(Parser parser)
Construct a new StreamParser, using the supplied base Parser.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Closes the input and releases resources including the underlying parser and reader.Document
complete()
Runs the parser until the input is fully read, and returns the completed Document.java.util.List<Node>
completeFragment()
When initialized as a fragment parse, runs the parser until the input is fully read, and returns the completed fragment child nodes.Document
document()
Get the currentDocument
as it is being parsed.Element
expectFirst(java.lang.String query)
Just likeselectFirst(String)
, but if there is no match, throws anIllegalArgumentException
.Element
expectNext(java.lang.String query)
Just likeselectFirst(String)
, but if there is no match, throws anIllegalArgumentException
.java.util.Iterator<Element>
iterator()
Returns anIterator
ofElement
s, with the input being parsed as each element is consumed.StreamParser
parse(java.io.Reader input, java.lang.String baseUri)
Provide the input for a Document parse.StreamParser
parse(java.lang.String input, java.lang.String baseUri)
Provide the input for a Document parse.StreamParser
parseFragment(java.io.Reader input, Element context, java.lang.String baseUri)
Provide the input for a fragment parse.StreamParser
parseFragment(java.lang.String input, Element context, java.lang.String baseUri)
Provide the input for a fragment parse.Element
selectFirst(java.lang.String query)
Finds the first Element that matches the provided query.Element
selectFirst(Evaluator eval)
Finds the first Element that matches the provided query.Element
selectNext(java.lang.String query)
Finds the next Element that matches the provided query.Element
selectNext(Evaluator eval)
Finds the next Element that matches the provided query.StreamParser
stop()
Flags that the parse should be stopped; the backing iterator will not return any more Elements.java.util.stream.Stream<Element>
stream()
Creates aStream
ofElement
s, with the input being parsed as each element is consumed.
-
-
-
Field Detail
-
parser
private final Parser parser
-
treeBuilder
private final TreeBuilder treeBuilder
-
it
private final StreamParser.ElementIterator it
-
document
private Document document
-
stopped
private boolean stopped
-
-
Constructor Detail
-
StreamParser
public StreamParser(Parser parser)
Construct a new StreamParser, using the supplied base Parser.- Parameters:
parser
- the configured base parser
-
-
Method Detail
-
parse
public StreamParser parse(java.io.Reader input, java.lang.String baseUri)
Provide the input for a Document parse. The input is not read until a consuming operation is called.- Parameters:
input
- the input to be read.baseUri
- the URL of this input, for absolute link resolution- Returns:
- this parser, for chaining
-
parse
public StreamParser parse(java.lang.String input, java.lang.String baseUri)
Provide the input for a Document parse. The input is not read until a consuming operation is called.- Parameters:
input
- the input to be readbaseUri
- the URL of this input, for absolute link resolution- Returns:
- this parser
-
parseFragment
public StreamParser parseFragment(java.io.Reader input, Element context, java.lang.String baseUri)
Provide the input for a fragment parse. The input is not read until a consuming operation is called.- Parameters:
input
- the input to be readcontext
- the optional fragment context elementbaseUri
- the URL of this input, for absolute link resolution- Returns:
- this parser
- See Also:
completeFragment()
-
parseFragment
public StreamParser parseFragment(java.lang.String input, Element context, java.lang.String baseUri)
Provide the input for a fragment parse. The input is not read until a consuming operation is called.- Parameters:
input
- the input to be readcontext
- the optional fragment context elementbaseUri
- the URL of this input, for absolute link resolution- Returns:
- this parser
- See Also:
completeFragment()
-
stream
public java.util.stream.Stream<Element> stream()
Creates aStream
ofElement
s, with the input being parsed as each element is consumed. Each Element returned will be complete (that is, all of its children will be included, and if it has a next sibling, that (empty) sibling will exist atElement.nextElementSibling()
). The stream will be emitted in document order as each element is closed. That means that child elements will be returned prior to their parents.The stream will start from the current position of the backing iterator and the parse.
When consuming the stream, if the Reader that the Parser is reading throws an I/O exception (for example a SocketTimeoutException), that will be emitted as an
UncheckedIOException
- Returns:
- a stream of Element objects
- Throws:
java.io.UncheckedIOException
- if the underlying Reader excepts during a read (in stream consuming methods)
-
iterator
public java.util.Iterator<Element> iterator()
Returns anIterator
ofElement
s, with the input being parsed as each element is consumed. Each Element returned will be complete (that is, all of its children will be included, and if it has a next sibling, that (empty) sibling will exist atElement.nextElementSibling()
). The elements will be emitted in document order as each element is closed. That means that child elements will be returned prior to their parents.The iterator will start from the current position of the parse.
The iterator is backed by this StreamParser, and the resources it holds.
- Returns:
- a stream of Element objects
-
stop
public StreamParser stop()
Flags that the parse should be stopped; the backing iterator will not return any more Elements.- Returns:
- this parser
-
close
public void close()
Closes the input and releases resources including the underlying parser and reader.The parser will also be closed when the input is fully read.
The parser can be reused with another call to
parse(Reader, String)
.- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
-
document
public Document document()
Get the currentDocument
as it is being parsed. It will be only partially complete until the input is fully read. Structural changes (e.g. insert, remove) may be made to the Document contents.- Returns:
- the (partial) Document
-
complete
public Document complete() throws java.io.IOException
Runs the parser until the input is fully read, and returns the completed Document.- Returns:
- the completed Document
- Throws:
java.io.IOException
- if an I/O error occurs
-
completeFragment
public java.util.List<Node> completeFragment() throws java.io.IOException
When initialized as a fragment parse, runs the parser until the input is fully read, and returns the completed fragment child nodes.- Returns:
- the completed child nodes
- Throws:
java.io.IOException
- if an I/O error occurs- See Also:
parseFragment(Reader, Element, String)
-
selectFirst
public Element selectFirst(java.lang.String query) throws java.io.IOException
Finds the first Element that matches the provided query. If the parsed Document does not already have a match, the input will be parsed until the first match is found, or the input is completely read.
-
expectFirst
public Element expectFirst(java.lang.String query) throws java.io.IOException
Just likeselectFirst(String)
, but if there is no match, throws anIllegalArgumentException
. This is useful if you want to simply abort processing on a failed match.- Parameters:
query
- theSelector
query.- Returns:
- the first matching element
- Throws:
java.lang.IllegalArgumentException
- if no match is foundjava.io.IOException
- if an I/O error occurs
-
selectFirst
public Element selectFirst(Evaluator eval) throws java.io.IOException
Finds the first Element that matches the provided query. If the parsed Document does not already have a match, the input will be parsed until the first match is found, or the input is completely read.
-
selectNext
public Element selectNext(java.lang.String query) throws java.io.IOException
Finds the next Element that matches the provided query. The input will be parsed until the next match is found, or the input is completely read.
-
expectNext
public Element expectNext(java.lang.String query) throws java.io.IOException
Just likeselectFirst(String)
, but if there is no match, throws anIllegalArgumentException
. This is useful if you want to simply abort processing on a failed match.- Parameters:
query
- theSelector
query.- Returns:
- the first matching element
- Throws:
java.lang.IllegalArgumentException
- if no match is foundjava.io.IOException
- if an I/O error occurs
-
-