Class Document

  • All Implemented Interfaces:
    java.lang.Cloneable

    public class Document
    extends Element
    A HTML Document.
    • Method Detail

      • createShell

        public static Document createShell​(java.lang.String baseUri)
        Create a valid, empty shell of a document, suitable for adding more elements to.
        Parameters:
        baseUri - baseUri of document
        Returns:
        document with html, head, and body elements.
      • location

        public java.lang.String location()
        Get the URL this Document was parsed from. If the starting URL is a redirect, this will return the final URL from which the document was served from.

        Will return an empty string if the location is unknown (e.g. if parsed from a String).

        Returns:
        location
      • connection

        public Connection connection()
        Returns the Connection (Request/Response) object that was used to fetch this document, if any; otherwise, a new default Connection object. This can be used to continue a session, preserving settings and cookies, etc.
        Returns:
        the Connection (session) associated with this Document, or an empty one otherwise.
        See Also:
        Connection.newRequest()
      • documentType

        public DocumentType documentType()
        Returns this Document's doctype.
        Returns:
        document type, or null if not set
      • htmlEl

        private Element htmlEl()
        Find the root HTML element, or create it if it doesn't exist.
        Returns:
        the root HTML element.
      • head

        public Element head()
        Get this document's head element.

        As a side-effect, if this Document does not already have a HTML structure, it will be created. If you do not want that, use #selectFirst("head") instead.

        Returns:
        head element.
      • body

        public Element body()
        Get this document's <body> or <frameset> element.

        As a side-effect, if this Document does not already have a HTML structure, it will be created with a <body> element. If you do not want that, use #selectFirst("body") instead.

        Returns:
        body element for documents with a <body>, a new <body> element if the document had no contents, or the outermost <frameset> element for frameset documents.
      • forms

        public java.util.List<FormElement> forms()
        Get each of the <form> elements contained in this document.
        Returns:
        a List of FormElement objects, which will be empty if there are none.
        Since:
        1.15.4
        See Also:
        Elements.forms(), FormElement.elements()
      • expectForm

        public FormElement expectForm​(java.lang.String cssQuery)
        Selects the first FormElement in this document that matches the query. If none match, throws an IllegalArgumentException.
        Parameters:
        cssQuery - a Selector CSS query
        Returns:
        the first matching <form> element
        Throws:
        java.lang.IllegalArgumentException - if no match is found
        Since:
        1.15.4
      • title

        public java.lang.String title()
        Get the string contents of the document's title element.
        Returns:
        Trimmed title, or empty string if none set.
      • title

        public void title​(java.lang.String title)
        Set the document's title element. Updates the existing element, or adds title to head if not present
        Parameters:
        title - string to set as title
      • createElement

        public Element createElement​(java.lang.String tagName)
        Create a new Element, with this document's base uri. Does not make the new element a child of this document.
        Parameters:
        tagName - element tag name (e.g. a)
        Returns:
        new element
      • outerHtml

        public java.lang.String outerHtml()
        Description copied from class: Node
        Get the outer HTML of this node. For example, on a p element, may return <p>Para</p>.
        Overrides:
        outerHtml in class Node
        Returns:
        outer HTML
        See Also:
        Element.html(), Element.text()
      • text

        public Element text​(java.lang.String text)
        Set the text of the body of this document. Any existing nodes within the body will be cleared.
        Overrides:
        text in class Element
        Parameters:
        text - unencoded text
        Returns:
        this document
      • nodeName

        public java.lang.String nodeName()
        Description copied from class: Node
        Get the node name of this node. Use for debugging purposes and not logic switching (for that, use instanceof).
        Overrides:
        nodeName in class Element
        Returns:
        node name
      • updateMetaCharsetElement

        public void updateMetaCharsetElement​(boolean update)
        Sets whether the element with charset information in this document is updated on changes through Document.charset(Charset) or not.

        If set to false (default) there are no elements modified.

        Parameters:
        update - If true the element updated on charset changes, false if not
        See Also:
        charset(java.nio.charset.Charset)
      • updateMetaCharsetElement

        public boolean updateMetaCharsetElement()
        Returns whether the element with charset information in this document is updated on changes through Document.charset(Charset) or not.
        Returns:
        Returns true if the element is updated on charset changes, false if not
      • clone

        public Document clone()
        Description copied from class: Node
        Create a stand-alone, deep copy of this node, and all of its children. The cloned node will have no siblings or parent node. As a stand-alone object, any changes made to the clone or any of its children will not impact the original node.

        The cloned node may be adopted into another Document or node structure using Element.appendChild(Node).

        Overrides:
        clone in class Element
        Returns:
        a stand-alone cloned node, including clones of any children
        See Also:
        Node.shallowClone()
      • shallowClone

        public Document shallowClone()
        Description copied from class: Node
        Create a stand-alone, shallow copy of this node. None of its children (if any) will be cloned, and it will have no parent or sibling nodes.
        Overrides:
        shallowClone in class Element
        Returns:
        a single independent copy of this node
        See Also:
        Node.clone()
      • ensureMetaCharsetElement

        private void ensureMetaCharsetElement()
        Ensures a meta charset (html) or xml declaration (xml) with the current encoding used. This only applies with updateMetaCharset set to true, otherwise this method does nothing.
        • An existing element gets updated with the current charset
        • If there's no element yet it will be inserted
        • Obsolete elements are removed

        Elements used:

        • Html: <meta charset="CHARSET">
        • Xml: <?xml version="1.0" encoding="CHARSET">
      • outputSettings

        public Document.OutputSettings outputSettings()
        Get the document's current output settings.
        Returns:
        the document's current output settings.
      • outputSettings

        public Document outputSettings​(Document.OutputSettings outputSettings)
        Set the document's output settings.
        Parameters:
        outputSettings - new output settings.
        Returns:
        this document, for chaining.
      • parser

        public Parser parser()
        Get the parser that was used to parse this document.
        Returns:
        the parser
      • parser

        public Document parser​(Parser parser)
        Set the parser used to create this document. This parser is then used when further parsing within this document is required.
        Parameters:
        parser - the configured parser to use when further parsing is required for this document.
        Returns:
        this document, for chaining.
      • connection

        public Document connection​(Connection connection)
        Set the Connection used to fetch this document. This Connection is used as a session object when further requests are made (e.g. when a form is submitted).
        Parameters:
        connection - to set
        Returns:
        this document, for chaining
        Since:
        1.14.1
        See Also:
        Connection.newRequest()