Class Document
- java.lang.Object
-
- org.jsoup.nodes.Node
-
- org.jsoup.nodes.Element
-
- org.jsoup.nodes.Document
-
- All Implemented Interfaces:
java.lang.Cloneable
public class Document extends Element
A HTML Document.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Document.OutputSettings
A Document's output settings control the form of the text() and html() methods.static class
Document.QuirksMode
-
Field Summary
Fields Modifier and Type Field Description private Connection
connection
private java.lang.String
location
private Document.OutputSettings
outputSettings
private Parser
parser
private Document.QuirksMode
quirksMode
private static Evaluator
titleEval
private boolean
updateMetaCharset
-
Fields inherited from class org.jsoup.nodes.Element
attributes, childNodes
-
Fields inherited from class org.jsoup.nodes.Node
EmptyNodes, EmptyString, parentNode, siblingIndex
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Element
body()
Get this document's<body>
or<frameset>
element.java.nio.charset.Charset
charset()
Returns the charset used in this document.void
charset(java.nio.charset.Charset charset)
Sets the charset used in this document.Document
clone()
Create a stand-alone, deep copy of this node, and all of its children.Connection
connection()
Returns the Connection (Request/Response) object that was used to fetch this document, if any; otherwise, a new default Connection object.Document
connection(Connection connection)
Set the Connection used to fetch this document.Element
createElement(java.lang.String tagName)
Create a new Element, with this document's base uri.static Document
createShell(java.lang.String baseUri)
Create a valid, empty shell of a document, suitable for adding more elements to.DocumentType
documentType()
Returns this Document's doctype.private void
ensureMetaCharsetElement()
Ensures a meta charset (html) or xml declaration (xml) with the current encoding used.FormElement
expectForm(java.lang.String cssQuery)
Selects the firstFormElement
in this document that matches the query.java.util.List<FormElement>
forms()
Get each of the<form>
elements contained in this document.Element
head()
Get this document'shead
element.private Element
htmlEl()
Find the root HTML element, or create it if it doesn't exist.java.lang.String
location()
Get the URL this Document was parsed from.java.lang.String
nodeName()
Get the node name of this node.java.lang.String
outerHtml()
Get the outer HTML of this node.Document.OutputSettings
outputSettings()
Get the document's current output settings.Document
outputSettings(Document.OutputSettings outputSettings)
Set the document's output settings.Parser
parser()
Get the parser that was used to parse this document.Document
parser(Parser parser)
Set the parser used to create this document.Document.QuirksMode
quirksMode()
Document
quirksMode(Document.QuirksMode quirksMode)
Document
shallowClone()
Create a stand-alone, shallow copy of this node.Element
text(java.lang.String text)
Set the text of thebody
of this document.java.lang.String
title()
Get the string contents of the document'stitle
element.void
title(java.lang.String title)
Set the document'stitle
element.boolean
updateMetaCharsetElement()
Returns whether the element with charset information in this document is updated on changes throughDocument.charset(Charset)
or not.void
updateMetaCharsetElement(boolean update)
Sets whether the element with charset information in this document is updated on changes throughDocument.charset(Charset)
or not.-
Methods inherited from class org.jsoup.nodes.Element
addClass, after, after, append, appendChild, appendChildren, appendElement, appendElement, appendText, appendTo, attr, attr, attribute, attributes, baseUri, before, before, child, childElementsList, childNodeSize, children, childrenSize, className, classNames, classNames, clearAttributes, closest, closest, cssSelector, data, dataNodes, dataset, doClone, doSetBaseUri, elementIs, elementSiblingIndex, empty, endSourceRange, ensureChildNodes, expectFirst, filter, firstElementChild, firstElementSibling, forEach, forEachNode, getAllElements, getElementById, getElementsByAttribute, getElementsByAttributeStarting, getElementsByAttributeValue, getElementsByAttributeValueContaining, getElementsByAttributeValueEnding, getElementsByAttributeValueMatching, getElementsByAttributeValueMatching, getElementsByAttributeValueNot, getElementsByAttributeValueStarting, getElementsByClass, getElementsByIndexEquals, getElementsByIndexGreaterThan, getElementsByIndexLessThan, getElementsByTag, getElementsContainingOwnText, getElementsContainingText, getElementsMatchingOwnText, getElementsMatchingOwnText, getElementsMatchingText, getElementsMatchingText, hasAttributes, hasChildNodes, hasClass, hasText, html, html, html, id, id, insertChildren, insertChildren, is, is, isBlock, lastElementChild, lastElementSibling, nextElementSibling, nextElementSiblings, nodelistChanged, normalName, outerHtmlHead, outerHtmlTail, ownText, parent, parents, prepend, prependChild, prependChildren, prependElement, prependElement, prependText, preserveWhitespace, previousElementSibling, previousElementSiblings, removeAttr, removeClass, root, select, select, selectFirst, selectFirst, selectXpath, selectXpath, shouldIndent, siblingElements, stream, tag, tagName, tagName, tagName, text, textNodes, toggleClass, traverse, val, val, wholeOwnText, wholeText, wrap
-
Methods inherited from class org.jsoup.nodes.Node
absUrl, addChildren, addChildren, attr, attributesSize, childNode, childNodes, childNodesAsArray, childNodesCopy, equals, firstChild, hasAttr, hashCode, hasParent, hasSameValue, indent, isEffectivelyFirst, lastChild, nameIs, nextSibling, nodeStream, nodeStream, outerHtml, ownerDocument, parentElementIs, parentNameIs, parentNode, previousSibling, remove, removeChild, reparentChild, replaceChild, replaceWith, setBaseUri, setParentNode, setSiblingIndex, siblingIndex, siblingNodes, sourceRange, toString, unwrap
-
-
-
-
Field Detail
-
connection
private Connection connection
-
outputSettings
private Document.OutputSettings outputSettings
-
parser
private Parser parser
-
quirksMode
private Document.QuirksMode quirksMode
-
location
private final java.lang.String location
-
updateMetaCharset
private boolean updateMetaCharset
-
titleEval
private static final Evaluator titleEval
-
-
Constructor Detail
-
Document
public Document(java.lang.String namespace, java.lang.String baseUri)
Create a new, empty Document, in the specified namespace.- Parameters:
namespace
- the namespace of this Document's root node.baseUri
- base URI of document- See Also:
Jsoup.parse(java.lang.String, java.lang.String)
,createShell(java.lang.String)
-
Document
public Document(java.lang.String baseUri)
Create a new, empty Document, in the HTML namespace.- Parameters:
baseUri
- base URI of document- See Also:
Jsoup.parse(java.lang.String, java.lang.String)
,Document(String namespace, String baseUri)
-
-
Method Detail
-
createShell
public static Document createShell(java.lang.String baseUri)
Create a valid, empty shell of a document, suitable for adding more elements to.- Parameters:
baseUri
- baseUri of document- Returns:
- document with html, head, and body elements.
-
location
public java.lang.String location()
Get the URL this Document was parsed from. If the starting URL is a redirect, this will return the final URL from which the document was served from.Will return an empty string if the location is unknown (e.g. if parsed from a String).
- Returns:
- location
-
connection
public Connection connection()
Returns the Connection (Request/Response) object that was used to fetch this document, if any; otherwise, a new default Connection object. This can be used to continue a session, preserving settings and cookies, etc.- Returns:
- the Connection (session) associated with this Document, or an empty one otherwise.
- See Also:
Connection.newRequest()
-
documentType
public DocumentType documentType()
Returns this Document's doctype.- Returns:
- document type, or null if not set
-
htmlEl
private Element htmlEl()
Find the root HTML element, or create it if it doesn't exist.- Returns:
- the root HTML element.
-
head
public Element head()
Get this document'shead
element.As a side-effect, if this Document does not already have a HTML structure, it will be created. If you do not want that, use
#selectFirst("head")
instead.- Returns:
head
element.
-
body
public Element body()
Get this document's<body>
or<frameset>
element.As a side-effect, if this Document does not already have a HTML structure, it will be created with a
<body>
element. If you do not want that, use#selectFirst("body")
instead.- Returns:
body
element for documents with a<body>
, a new<body>
element if the document had no contents, or the outermost<frameset> element
for frameset documents.
-
forms
public java.util.List<FormElement> forms()
Get each of the<form>
elements contained in this document.- Returns:
- a List of FormElement objects, which will be empty if there are none.
- Since:
- 1.15.4
- See Also:
Elements.forms()
,FormElement.elements()
-
expectForm
public FormElement expectForm(java.lang.String cssQuery)
Selects the firstFormElement
in this document that matches the query. If none match, throws anIllegalArgumentException
.- Parameters:
cssQuery
- aSelector
CSS query- Returns:
- the first matching
<form>
element - Throws:
java.lang.IllegalArgumentException
- if no match is found- Since:
- 1.15.4
-
title
public java.lang.String title()
Get the string contents of the document'stitle
element.- Returns:
- Trimmed title, or empty string if none set.
-
title
public void title(java.lang.String title)
Set the document'stitle
element. Updates the existing element, or addstitle
tohead
if not present- Parameters:
title
- string to set as title
-
createElement
public Element createElement(java.lang.String tagName)
Create a new Element, with this document's base uri. Does not make the new element a child of this document.- Parameters:
tagName
- element tag name (e.g.a
)- Returns:
- new element
-
outerHtml
public java.lang.String outerHtml()
Description copied from class:Node
Get the outer HTML of this node. For example, on ap
element, may return<p>Para</p>
.- Overrides:
outerHtml
in classNode
- Returns:
- outer HTML
- See Also:
Element.html()
,Element.text()
-
text
public Element text(java.lang.String text)
Set the text of thebody
of this document. Any existing nodes within the body will be cleared.
-
nodeName
public java.lang.String nodeName()
Description copied from class:Node
Get the node name of this node. Use for debugging purposes and not logic switching (for that, use instanceof).
-
charset
public void charset(java.nio.charset.Charset charset)
Sets the charset used in this document. This method is equivalent toOutputSettings.charset(Charset)
but in addition it updates the charset / encoding element within the document.This enables
meta charset update
.If there's no element with charset / encoding information yet it will be created. Obsolete charset / encoding definitions are removed!
Elements used:
- Html: <meta charset="CHARSET">
- Xml: <?xml version="1.0" encoding="CHARSET">
- Parameters:
charset
- Charset- See Also:
updateMetaCharsetElement(boolean)
,Document.OutputSettings.charset(java.nio.charset.Charset)
-
charset
public java.nio.charset.Charset charset()
Returns the charset used in this document. This method is equivalent toDocument.OutputSettings.charset()
.- Returns:
- Current Charset
- See Also:
Document.OutputSettings.charset()
-
updateMetaCharsetElement
public void updateMetaCharsetElement(boolean update)
Sets whether the element with charset information in this document is updated on changes throughDocument.charset(Charset)
or not.If set to false (default) there are no elements modified.
- Parameters:
update
- If true the element updated on charset changes, false if not- See Also:
charset(java.nio.charset.Charset)
-
updateMetaCharsetElement
public boolean updateMetaCharsetElement()
Returns whether the element with charset information in this document is updated on changes throughDocument.charset(Charset)
or not.- Returns:
- Returns true if the element is updated on charset changes, false if not
-
clone
public Document clone()
Description copied from class:Node
Create a stand-alone, deep copy of this node, and all of its children. The cloned node will have no siblings or parent node. As a stand-alone object, any changes made to the clone or any of its children will not impact the original node.The cloned node may be adopted into another Document or node structure using
Element.appendChild(Node)
.- Overrides:
clone
in classElement
- Returns:
- a stand-alone cloned node, including clones of any children
- See Also:
Node.shallowClone()
-
shallowClone
public Document shallowClone()
Description copied from class:Node
Create a stand-alone, shallow copy of this node. None of its children (if any) will be cloned, and it will have no parent or sibling nodes.- Overrides:
shallowClone
in classElement
- Returns:
- a single independent copy of this node
- See Also:
Node.clone()
-
ensureMetaCharsetElement
private void ensureMetaCharsetElement()
Ensures a meta charset (html) or xml declaration (xml) with the current encoding used. This only applies withupdateMetaCharset
set to true, otherwise this method does nothing.- An existing element gets updated with the current charset
- If there's no element yet it will be inserted
- Obsolete elements are removed
Elements used:
- Html: <meta charset="CHARSET">
- Xml: <?xml version="1.0" encoding="CHARSET">
-
outputSettings
public Document.OutputSettings outputSettings()
Get the document's current output settings.- Returns:
- the document's current output settings.
-
outputSettings
public Document outputSettings(Document.OutputSettings outputSettings)
Set the document's output settings.- Parameters:
outputSettings
- new output settings.- Returns:
- this document, for chaining.
-
quirksMode
public Document.QuirksMode quirksMode()
-
quirksMode
public Document quirksMode(Document.QuirksMode quirksMode)
-
parser
public Parser parser()
Get the parser that was used to parse this document.- Returns:
- the parser
-
parser
public Document parser(Parser parser)
Set the parser used to create this document. This parser is then used when further parsing within this document is required.- Parameters:
parser
- the configured parser to use when further parsing is required for this document.- Returns:
- this document, for chaining.
-
connection
public Document connection(Connection connection)
Set the Connection used to fetch this document. This Connection is used as a session object when further requests are made (e.g. when a form is submitted).- Parameters:
connection
- to set- Returns:
- this document, for chaining
- Since:
- 1.14.1
- See Also:
Connection.newRequest()
-
-