Package org.jsoup.helper
Class DataUtil
- java.lang.Object
-
- org.jsoup.helper.DataUtil
-
public final class DataUtil extends java.lang.Object
Internal static utilities for handling data.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
DataUtil.BomCharset
(package private) static class
DataUtil.CharsetDoc
A struct to return a detected charset, and a document (if fully read).
-
Field Summary
Fields Modifier and Type Field Description (package private) static int
boundaryLength
private static java.util.regex.Pattern
charsetPattern
(package private) static java.lang.String
defaultCharsetName
private static int
firstReadBufferSize
private static char[]
mimeBoundaryChars
static java.nio.charset.Charset
UTF_8
-
Constructor Summary
Constructors Modifier Constructor Description private
DataUtil()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description (package private) static void
crossStreams(java.io.InputStream in, java.io.OutputStream out)
Writes the input stream to the output stream.(package private) static DataUtil.CharsetDoc
detectCharset(java.io.InputStream input, java.lang.String charsetName, java.lang.String baseUri, Parser parser)
private static DataUtil.BomCharset
detectCharsetFromBom(java.nio.ByteBuffer byteData)
(package private) static java.nio.ByteBuffer
emptyByteBuffer()
(package private) static java.lang.String
getCharsetFromContentType(java.lang.String contentType)
Parse out a charset from a content type header.static Document
load(java.io.File file, java.lang.String charsetName, java.lang.String baseUri)
Loads and parses a file to a Document, with the HtmlParser.static Document
load(java.io.File file, java.lang.String charsetName, java.lang.String baseUri, Parser parser)
Loads and parses a file to a Document.static Document
load(java.io.InputStream in, java.lang.String charsetName, java.lang.String baseUri)
Parses a Document from an input steam.static Document
load(java.io.InputStream in, java.lang.String charsetName, java.lang.String baseUri, Parser parser)
Parses a Document from an input steam, using the provided Parser.static Document
load(java.nio.file.Path path, java.lang.String charsetName, java.lang.String baseUri)
Loads and parses a file to a Document, with the HtmlParser.static Document
load(java.nio.file.Path path, java.lang.String charsetName, java.lang.String baseUri, Parser parser)
Loads and parses a file to a Document.(package private) static void
maybeSkipBom(java.io.Reader reader, DataUtil.CharsetDoc charsetDoc)
(package private) static java.lang.String
mimeBoundary()
Creates a random string, suitable for use as a mime boundaryprivate static java.io.InputStream
openStream(java.nio.file.Path path)
Open an input stream from a file; if it's a gzip file, returns a GZIPInputStream to unzip it.(package private) static Document
parseInputStream(java.io.InputStream input, java.lang.String charsetName, java.lang.String baseUri, Parser parser)
(package private) static Document
parseInputStream(DataUtil.CharsetDoc charsetDoc, java.lang.String baseUri, Parser parser)
static java.nio.ByteBuffer
readToByteBuffer(java.io.InputStream inStream, int maxSize)
Read the input stream into a byte buffer.static StreamParser
streamParser(java.nio.file.Path path, java.nio.charset.Charset charset, java.lang.String baseUri, Parser parser)
Returns aStreamParser
that will parse the supplied file progressively.private static java.lang.String
validateCharset(java.lang.String cs)
-
-
-
Field Detail
-
charsetPattern
private static final java.util.regex.Pattern charsetPattern
-
UTF_8
public static final java.nio.charset.Charset UTF_8
-
defaultCharsetName
static final java.lang.String defaultCharsetName
-
firstReadBufferSize
private static final int firstReadBufferSize
- See Also:
- Constant Field Values
-
mimeBoundaryChars
private static final char[] mimeBoundaryChars
-
boundaryLength
static final int boundaryLength
- See Also:
- Constant Field Values
-
-
Method Detail
-
load
public static Document load(java.io.File file, java.lang.String charsetName, java.lang.String baseUri) throws java.io.IOException
Loads and parses a file to a Document, with the HtmlParser. Files that are compressed with gzip (and end in.gz
or.z
) are supported in addition to uncompressed files.- Parameters:
file
- file to loadcharsetName
- (optional) character set of input; specifynull
to attempt to autodetect. A BOM in the file will always override this setting.baseUri
- base URI of document, to resolve relative links against- Returns:
- Document
- Throws:
java.io.IOException
- on IO error
-
load
public static Document load(java.io.File file, java.lang.String charsetName, java.lang.String baseUri, Parser parser) throws java.io.IOException
Loads and parses a file to a Document. Files that are compressed with gzip (and end in.gz
or.z
) are supported in addition to uncompressed files.- Parameters:
file
- file to loadcharsetName
- (optional) character set of input; specifynull
to attempt to autodetect. A BOM in the file will always override this setting.baseUri
- base URI of document, to resolve relative links againstparser
- alternateparser
to use.- Returns:
- Document
- Throws:
java.io.IOException
- on IO error- Since:
- 1.14.2
-
load
public static Document load(java.nio.file.Path path, java.lang.String charsetName, java.lang.String baseUri) throws java.io.IOException
Loads and parses a file to a Document, with the HtmlParser. Files that are compressed with gzip (and end in.gz
or.z
) are supported in addition to uncompressed files.- Parameters:
path
- file to loadcharsetName
- (optional) character set of input; specifynull
to attempt to autodetect. A BOM in the file will always override this setting.baseUri
- base URI of document, to resolve relative links against- Returns:
- Document
- Throws:
java.io.IOException
- on IO error
-
load
public static Document load(java.nio.file.Path path, java.lang.String charsetName, java.lang.String baseUri, Parser parser) throws java.io.IOException
Loads and parses a file to a Document. Files that are compressed with gzip (and end in.gz
or.z
) are supported in addition to uncompressed files.- Parameters:
path
- file to loadcharsetName
- (optional) character set of input; specifynull
to attempt to autodetect. A BOM in the file will always override this setting.baseUri
- base URI of document, to resolve relative links againstparser
- alternateparser
to use.- Returns:
- Document
- Throws:
java.io.IOException
- on IO error- Since:
- 1.17.2
-
streamParser
public static StreamParser streamParser(java.nio.file.Path path, java.nio.charset.Charset charset, java.lang.String baseUri, Parser parser) throws java.io.IOException
Returns aStreamParser
that will parse the supplied file progressively. Files that are compressed with gzip (and end in.gz
or.z
) are supported in addition to uncompressed files.- Parameters:
path
- file to loadcharset
- (optional) character set of input; specifynull
to attempt to autodetect from metadata. A BOM in the file will always override this setting.baseUri
- base URI of document, to resolve relative links againstparser
- alternateparser
to use.- Returns:
- Document
- Throws:
java.io.IOException
- on IO error- Since:
- 1.18.2
- See Also:
Connection.Response.streamParser()
-
openStream
private static java.io.InputStream openStream(java.nio.file.Path path) throws java.io.IOException
Open an input stream from a file; if it's a gzip file, returns a GZIPInputStream to unzip it.- Throws:
java.io.IOException
-
load
public static Document load(java.io.InputStream in, java.lang.String charsetName, java.lang.String baseUri) throws java.io.IOException
Parses a Document from an input steam.- Parameters:
in
- input stream to parse. The stream will be closed after reading.charsetName
- character set of input (optional)baseUri
- base URI of document, to resolve relative links against- Returns:
- Document
- Throws:
java.io.IOException
- on IO error
-
load
public static Document load(java.io.InputStream in, java.lang.String charsetName, java.lang.String baseUri, Parser parser) throws java.io.IOException
Parses a Document from an input steam, using the provided Parser.- Parameters:
in
- input stream to parse. The stream will be closed after reading.charsetName
- character set of input (optional)baseUri
- base URI of document, to resolve relative links againstparser
- alternateparser
to use.- Returns:
- Document
- Throws:
java.io.IOException
- on IO error
-
crossStreams
static void crossStreams(java.io.InputStream in, java.io.OutputStream out) throws java.io.IOException
Writes the input stream to the output stream. Doesn't close them.- Parameters:
in
- input stream to read fromout
- output stream to write to- Throws:
java.io.IOException
- on IO error
-
parseInputStream
static Document parseInputStream(java.io.InputStream input, java.lang.String charsetName, java.lang.String baseUri, Parser parser) throws java.io.IOException
- Throws:
java.io.IOException
-
detectCharset
static DataUtil.CharsetDoc detectCharset(java.io.InputStream input, java.lang.String charsetName, java.lang.String baseUri, Parser parser) throws java.io.IOException
- Throws:
java.io.IOException
-
parseInputStream
static Document parseInputStream(DataUtil.CharsetDoc charsetDoc, java.lang.String baseUri, Parser parser) throws java.io.IOException
- Throws:
java.io.IOException
-
maybeSkipBom
static void maybeSkipBom(java.io.Reader reader, DataUtil.CharsetDoc charsetDoc) throws java.io.IOException
- Throws:
java.io.IOException
-
readToByteBuffer
public static java.nio.ByteBuffer readToByteBuffer(java.io.InputStream inStream, int maxSize) throws java.io.IOException
Read the input stream into a byte buffer. To deal with slow input streams, you may interrupt the thread this method is executing on. The data read until being interrupted will be available.- Parameters:
inStream
- the input stream to read frommaxSize
- the maximum size in bytes to read from the stream. Set to 0 to be unlimited.- Returns:
- the filled byte buffer
- Throws:
java.io.IOException
- if an exception occurs whilst reading from the input stream.
-
emptyByteBuffer
static java.nio.ByteBuffer emptyByteBuffer()
-
getCharsetFromContentType
static java.lang.String getCharsetFromContentType(java.lang.String contentType)
Parse out a charset from a content type header. If the charset is not supported, returns null (so the default will kick in.)- Parameters:
contentType
- e.g. "text/html; charset=EUC-JP"- Returns:
- "EUC-JP", or null if not found. Charset is trimmed and uppercased.
-
validateCharset
private static java.lang.String validateCharset(java.lang.String cs)
-
mimeBoundary
static java.lang.String mimeBoundary()
Creates a random string, suitable for use as a mime boundary
-
detectCharsetFromBom
private static DataUtil.BomCharset detectCharsetFromBom(java.nio.ByteBuffer byteData)
-
-