Class Document


public class Document extends ContainerNode
Represents the root of an XML document, containing the document element and preserving document-level formatting like XML declarations and DTDs.

The Document class serves as the top-level container for an XML document, maintaining the document element along with document-level metadata such as XML declarations, DOCTYPE declarations, and encoding information. It preserves the exact formatting of these elements during round-trip parsing and serialization.

Document Properties:

  • XML Declaration - Maintains original XML declaration formatting
  • DOCTYPE Support - Preserves DOCTYPE declarations exactly as written
  • Encoding - Tracks document encoding information
  • Version - Maintains XML version information
  • Standalone Flag - Preserves standalone document declarations

Usage Examples:

// Create documents using factory methods
Document doc = Document.of(); // Empty document
Document parsed = Document.of(xmlString); // Parse XML from String
Document fromStream = Document.of(inputStream); // Parse XML from InputStream
Document fromFile = Document.of(Paths.get("config.xml")); // Parse XML from file
Document withDecl = Document.withXmlDeclaration("1.0", "UTF-8");
Document complete = Document.withRootElement("project");

// Set the root element
Element root = Element.of("root");
doc.root(root);

// Access document properties
String encoding = doc.encoding(); // "UTF-8"
String version = doc.version();   // "1.0"

// Complex documents using fluent API
Document complex = Document.of()
    .version("1.1")
    .encoding("UTF-8")
    .standalone(true)
    .root(Element.of("project"))
    .withXmlDeclaration();

Document Structure:

A Document can contain:

  • Exactly one document element (root element)
  • Zero or more comments and processing instructions
  • Whitespace between top-level nodes
  • An optional XML declaration
  • An optional DOCTYPE declaration
See Also:
  • Constructor Details

    • Document

      public Document()
      Creates a new empty XML document with default settings.

      Initializes the document with UTF-8 encoding, XML version 1.0, and standalone set to false. The XML declaration and DOCTYPE are initially empty.

  • Method Details

    • parent

      public Document parent(ContainerNode parent)
      Sets the parent container node of this node.

      This method is typically called automatically when adding nodes to containers. Manual use should be done carefully to maintain tree consistency.

      Overrides:
      parent in class Node
      Parameters:
      parent - the parent container node to set, or null to clear the parent
      Returns:
      this document for method chaining
      See Also:
    • type

      public Node.NodeType type()
      Returns the node type for this document.
      Specified by:
      type in class Node
      Returns:
      Node.NodeType.DOCUMENT
    • xmlDeclaration

      public String xmlDeclaration()
      Gets the XML declaration string for this document.

      The XML declaration typically contains version, encoding, and standalone information, formatted as: <?xml version="1.0" encoding="UTF-8"?>

      Returns:
      the XML declaration string, or empty string if none is set
      See Also:
    • xmlDeclaration

      public Document xmlDeclaration(String xmlDeclaration)
      Sets the XML declaration for this document.

      The XML declaration should be a complete declaration including the opening <?xml and closing ?> tags. Setting this value marks the document as modified.

      Example: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

      Parameters:
      xmlDeclaration - the XML declaration string, or null to clear it
      Returns:
      this document for method chaining
      See Also:
    • doctype

      public String doctype()
      Gets the DOCTYPE declaration for this document.

      The DOCTYPE declaration defines the document type and may include references to external DTD files or inline DTD definitions.

      Returns:
      the DOCTYPE declaration string, or empty string if none is set
      See Also:
    • doctype

      public Document doctype(String doctype)
      Sets the DOCTYPE declaration for this document.

      The DOCTYPE declaration should be a complete declaration including the opening <!DOCTYPE and closing > tags. Setting this value marks the document as modified.

      Example: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

      Parameters:
      doctype - the DOCTYPE declaration string, or null to clear it
      Returns:
      this document for method chaining
      See Also:
    • doctypePrecedingWhitespace

      public String doctypePrecedingWhitespace()
      Gets the whitespace before the DOCTYPE declaration.

      This whitespace appears between the XML declaration and the DOCTYPE declaration. It is preserved during round-trip parsing and serialization to maintain document fidelity.

      Returns:
      the whitespace before the DOCTYPE declaration, or empty string if none
    • root

      public Element root()
      Gets the root element of this document.

      The document element is the top-level element that contains all other elements in the document. Every well-formed XML document must have exactly one document element.

      Returns:
      the root element, or null if none is set
      See Also:
    • root

      public Document root(Element root)
      Sets the root element of this document.

      The document element becomes the top-level element containing all other elements. Setting this value marks the document as modified and establishes the parent-child relationship.

      Parameters:
      root - the element to set as the document root, or null to clear it
      Returns:
      this document for method chaining
      See Also:
    • encoding

      public String encoding()
      Gets the character encoding for this document.

      The encoding specifies how the document's characters are encoded. Common values include "UTF-8", "UTF-16", "ISO-8859-1", etc.

      Returns:
      the document encoding, defaults to "UTF-8"
      See Also:
    • encoding

      public Document encoding(String encoding)
      Set the document's character encoding used for serialization.

      If encoding is null, the default "UTF-8" is used. This method marks the document as modified.

      Parameters:
      encoding - the character encoding to use, or null to reset to the default "UTF-8"
      Returns:
      this document for method chaining
      See Also:
    • version

      public String version()
      Gets the XML version for this document.

      The XML version indicates which version of the XML specification this document conforms to. Common values are "1.0" and "1.1".

      Returns:
      the XML version, defaults to "1.0"
      See Also:
    • version

      public Document version(String version)
      Set the XML version of this document.

      Marks the document as modified.

      Parameters:
      version - the XML version to use, or null to use "1.0"
      Returns:
      this document
      See Also:
    • isStandalone

      public boolean isStandalone()
      Gets the standalone flag for this document.

      The standalone flag indicates whether the document is self-contained or depends on external markup declarations. When true, the document declares that it has no external dependencies.

      Returns:
      true if the document is standalone, false otherwise
      See Also:
    • standalone

      public Document standalone(boolean standalone)
      Sets the standalone flag for this document.

      Setting this value marks the document as modified. The standalone flag affects the XML declaration output.

      Parameters:
      standalone - true if the document is standalone, false otherwise
      Returns:
      this document for method chaining
      See Also:
    • hasBom

      public boolean hasBom()
      Returns whether this document had a Byte Order Mark (BOM) when it was parsed.

      When true, the BOM will be written back when serializing to an OutputStream via toXml(OutputStream), toXml(OutputStream, Charset), or toXml(OutputStream, String). The BOM is never included in the string output from Node.toXml().

      Returns:
      true if the document had a BOM, false otherwise
      Since:
      1.0.0
      See Also:
    • bom

      public Document bom(boolean bom)
      Sets whether a Byte Order Mark (BOM) should be written when serializing to an OutputStream.
      Parameters:
      bom - true to write a BOM, false otherwise
      Returns:
      this document for method chaining
      Since:
      1.0.0
      See Also:
    • toXml

      public void toXml(StringBuilder sb)
      Serializes this document to XML, appending to the provided StringBuilder.

      This method preserves the original formatting including XML declaration, DOCTYPE declaration, whitespace, and all child nodes. The output includes:

      • XML declaration (if present)
      • DOCTYPE declaration (if present)
      • Preceding whitespace
      • All child nodes (comments, processing instructions, elements)
      • Document element (if not already included in children)
      • Following whitespace
      Specified by:
      toXml in class Node
      Parameters:
      sb - the StringBuilder to append the XML content to
      See Also:
    • accept

      public DomTripVisitor.Action accept(DomTripVisitor visitor)
      Accepts a visitor for depth-first tree traversal of the entire document.

      Visits all children of the document (comments, processing instructions, and the root element) in document order.

      Specified by:
      accept in class Node
      Parameters:
      visitor - the visitor to accept
      Returns:
      the action indicating how traversal should proceed
      Throws:
      IllegalArgumentException - if visitor is null
      Since:
      1.3.0
      See Also:
    • toXml

      public void toXml(OutputStream outputStream) throws DomTripException
      Serializes this document to an OutputStream using the document's encoding.

      This method uses the document's encoding property to determine the character encoding for the output stream. If the document has no encoding specified, UTF-8 is used as the default.

      Parameters:
      outputStream - the OutputStream to write to
      Throws:
      DomTripException - if serialization fails or I/O errors occur
    • toXml

      public void toXml(OutputStream outputStream, Charset charset) throws DomTripException
      Serializes this document to an OutputStream using the specified charset.

      This method allows explicit control over the character encoding used for serialization, regardless of the document's encoding property.

      Parameters:
      outputStream - the OutputStream to write to
      charset - the character encoding to use
      Throws:
      DomTripException - if serialization fails or I/O errors occur
    • toXml

      public void toXml(OutputStream outputStream, String encoding) throws DomTripException
      Serializes this document to an OutputStream using the specified encoding.

      This method allows explicit control over the character encoding used for serialization, regardless of the document's encoding property.

      Parameters:
      outputStream - the OutputStream to write to
      encoding - the character encoding name to use
      Throws:
      DomTripException - if serialization fails or I/O errors occur
    • generateXmlDeclaration

      public String generateXmlDeclaration()
      Creates a minimal XML declaration based on current document settings.

      Generates an XML declaration using the current version, encoding, and standalone settings. The declaration follows the standard format:

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

      The standalone attribute is only included if the standalone flag is true.

      Returns:
      a properly formatted XML declaration string
      See Also:
    • toString

      public String toString()
      Returns a string representation of this document for debugging purposes.

      The string includes the XML version, encoding, and the name of the document element (if present).

      Overrides:
      toString in class Object
      Returns:
      a string representation of this document
    • of

      public static Document of()
      Creates an empty document with default settings.
      Returns:
      a new empty Document
    • of

      public static Document of(String xml) throws DomTripException
      Creates a document by parsing the provided XML string.

      This is a convenience method that combines document creation and XML parsing in a single call. It uses the default parser configuration.

      Parameters:
      xml - the XML string to parse
      Returns:
      a new Document containing the parsed XML
      Throws:
      DomTripException - if the XML is malformed or cannot be parsed
    • parseFragment

      public static List<Node> parseFragment(String xml) throws DomTripException
      Parses an XML fragment into a list of nodes.

      This method parses an XML fragment that may contain multiple root-level elements, comments, processing instructions, and text nodes. Unlike of(String), which expects a well-formed XML document, this method handles fragments that don't have a single root element.

      Usage Examples:

      // Parse a fragment with multiple elements
      List<Node> nodes = Document.parseFragment("<foo>bar</foo><bar>baz</bar>");
      
      // Parse a fragment with comments and elements
      List<Node> nodes = Document.parseFragment(
          "<!-- comment -->\n<foo>bar</foo>\n<bar>baz</bar>");
      
      Parameters:
      xml - the XML fragment string to parse
      Returns:
      a list of parsed nodes
      Throws:
      DomTripException - if the XML fragment is malformed
    • of

      public static Document of(InputStream inputStream) throws DomTripException
      Creates a document by parsing XML from an InputStream with automatic encoding detection.

      This method automatically detects the character encoding by:

      1. Checking for a Byte Order Mark (BOM)
      2. Reading the XML declaration to extract the encoding attribute
      3. Falling back to UTF-8 if no encoding is specified

      The resulting Document will have its encoding property set to the detected or declared encoding.

      Parameters:
      inputStream - the InputStream containing XML data
      Returns:
      a new Document containing the parsed XML with preserved formatting
      Throws:
      DomTripException - if the XML is malformed, cannot be parsed, or I/O errors occur
    • of

      public static Document of(InputStream inputStream, Charset defaultCharset) throws DomTripException
      Creates a document by parsing XML from an InputStream with encoding detection and fallback.

      This method attempts to detect the character encoding by:

      1. Checking for a Byte Order Mark (BOM)
      2. Reading the XML declaration to extract the encoding attribute
      3. Using the provided default charset if detection fails

      The resulting Document will have its encoding property set to the detected, declared, or default encoding.

      Parameters:
      inputStream - the InputStream containing XML data
      defaultCharset - the charset to use if detection fails
      Returns:
      a new Document containing the parsed XML with preserved formatting
      Throws:
      DomTripException - if the XML is malformed, cannot be parsed, or I/O errors occur
    • of

      public static Document of(InputStream inputStream, String defaultEncoding) throws DomTripException
      Creates a document by parsing XML from an InputStream with encoding detection and fallback.

      This method attempts to detect the character encoding by:

      1. Checking for a Byte Order Mark (BOM)
      2. Reading the XML declaration to extract the encoding attribute
      3. Using the provided default encoding if detection fails

      The resulting Document will have its encoding property set to the detected, declared, or default encoding.

      Parameters:
      inputStream - the InputStream containing XML data
      defaultEncoding - the encoding name to use if detection fails
      Returns:
      a new Document containing the parsed XML with preserved formatting
      Throws:
      DomTripException - if the XML is malformed, cannot be parsed, or I/O errors occur
    • of

      public static Document of(Path path) throws DomTripException
      Creates a document by parsing XML from a file path with automatic encoding detection.

      This is a convenience method that combines file reading and XML parsing in a single call. It leverages the InputStream-based parsing with automatic encoding detection to properly handle various character encodings.

      The method automatically detects the character encoding by:

      1. Checking for a Byte Order Mark (BOM)
      2. Reading the XML declaration to extract the encoding attribute
      3. Falling back to UTF-8 if no encoding is specified

      This method provides the most robust way to parse XML files as it properly handles character encoding detection and avoids potential encoding issues.

      Usage Examples:

      // Parse XML file with automatic encoding detection
      Document doc = Document.of(Paths.get("config.xml"));
      
      // Works with various encodings
      Document utf8Doc = Document.of(Paths.get("utf8-file.xml"));
      Document utf16Doc = Document.of(Paths.get("utf16-file.xml"));
      Document isoDoc = Document.of(Paths.get("iso-8859-1-file.xml"));
      
      // Use with try-with-resources for proper resource management
      try {
          Document doc = Document.of(configPath);
          Editor editor = new Editor(doc);
          // ... edit document
      } catch (DomTripException e) {
          System.err.println("Failed to parse XML: " + e.getMessage());
      }
      
      Parameters:
      path - the path to the XML file to parse
      Returns:
      a new Document containing the parsed XML with preserved formatting
      Throws:
      DomTripException - if the file cannot be read, the XML is malformed, or cannot be parsed
      See Also:
    • copy

      public Document copy()
      Creates a deep copy of this node.

      The copied node will have:

      • All properties copied from the original
      • All child nodes recursively copied (for container nodes)
      • Whitespace and formatting properties preserved
      • No parent (parent is set to null)

      The copied node and its descendants will have their parent-child relationships properly established within the copied subtree.

      Specified by:
      copy in class Node
      Returns:
      a new node that is a deep copy of this node
      Since:
      1.1.0
    • clone

      @Deprecated public Document clone()
      Deprecated.
      Use copy() instead.
      Creates a deep copy of this document.
      Overrides:
      clone in class Node
      Returns:
      a new document that is a copy of this document
    • withXmlDeclaration

      public static Document withXmlDeclaration(String version, String encoding)
      Creates a document with XML declaration.

      Creates a document with the specified version and encoding, automatically generating an appropriate XML declaration.

      Parameters:
      version - the XML version (e.g., "1.0", "1.1"), or null for default "1.0"
      encoding - the character encoding (e.g., "UTF-8"), or null for default "UTF-8"
      Returns:
      a new Document with XML declaration
    • withXmlDeclaration

      public static Document withXmlDeclaration(String version, String encoding, boolean standalone)
      Creates a document with XML declaration and standalone attribute.

      Creates a document with the specified version, encoding, and standalone flag, automatically generating an appropriate XML declaration.

      Parameters:
      version - the XML version, or null for default "1.0"
      encoding - the character encoding, or null for default "UTF-8"
      standalone - true if the document is standalone, false otherwise
      Returns:
      a new Document with XML declaration and standalone attribute
    • withRootElement

      public static Document withRootElement(String rootElementName) throws DomTripException
      Creates a document with a root element and XML declaration.

      Creates a complete document with XML declaration (version 1.0, UTF-8 encoding) and the specified root element.

      Parameters:
      rootElementName - the name of the root element
      Returns:
      a new Document with XML declaration and root element
      Throws:
      DomTripException
    • withDoctype

      public static Document withDoctype(String version, String encoding, String doctype)
      Creates a document with XML declaration and DOCTYPE.

      Creates a document with the specified version, encoding, and DOCTYPE declaration, automatically generating an appropriate XML declaration.

      Parameters:
      version - the XML version, or null for default "1.0"
      encoding - the character encoding, or null for default "UTF-8"
      doctype - the DOCTYPE declaration string
      Returns:
      a new Document with XML declaration and DOCTYPE
    • minimal

      public static Document minimal(String rootElementName) throws DomTripException
      Creates a minimal document with just a root element (no XML declaration).

      Creates a simple document containing only the specified root element, without any XML declaration or DOCTYPE.

      Parameters:
      rootElementName - the name of the root element
      Returns:
      a new minimal Document with only a root element
      Throws:
      DomTripException
    • withXmlDeclaration

      public Document withXmlDeclaration()
      Generates and sets an XML declaration based on current document settings.

      The XML declaration will include the version, encoding, and standalone flag (if true) based on the current document configuration.

      Returns:
      this document for method chaining