Class Parser
The Parser class is responsible for converting XML text into DomTrip's internal node tree representation. Unlike traditional XML parsers that normalize content and lose formatting information, this parser meticulously preserves every aspect of the original XML formatting to enable perfect round-trip processing.
Parsing Features:
- Whitespace Preservation - Maintains all whitespace exactly as written
- Automatic Whitespace Normalization - Never creates Text nodes with only whitespace
- Attribute Formatting - Preserves quote styles, order, and spacing
- Comment Preservation - Keeps all XML comments in their original positions
- Entity Preservation - Maintains entity references in their original form
- Processing Instructions - Preserves PIs including XML declarations
- CDATA Sections - Maintains CDATA boundaries and content
Parsing Process:
The parser uses a stack-based approach to build the XML tree:
- Tokenizes the input XML character by character
- Identifies XML constructs (elements, comments, text, etc.)
- Preserves original formatting information for each construct
- Automatically normalizes whitespace-only content to element properties
- Builds a complete node tree with parent-child relationships
- Maintains modification flags for selective formatting preservation
Whitespace Normalization:
The parser automatically normalizes whitespace during parsing to ensure a clean tree structure:
- No Whitespace-Only Text Nodes - Whitespace between elements is captured in element properties
- Mixed Content Preservation - Text nodes with actual content preserve their whitespace
- Lossless Round-Trip - All whitespace is preserved for perfect XML reconstruction
- Element Properties - Whitespace stored in precedingWhitespace, innerPrecedingWhitespace, etc.
Error Handling:
The parser provides detailed error information for malformed XML:
- Precise error positions within the source text
- Descriptive error messages for common XML problems
- Context information to help locate and fix issues
Usage:
Parser parser = new Parser();
try {
// Parse from String
Document document = parser.parse(xmlString);
// Parse from InputStream with encoding detection
Document document2 = parser.parse(inputStream);
// Parse from InputStream with fallback encoding
Document document3 = parser.parse(inputStream, "UTF-8");
// Use the parsed document
} catch (DomTripException e) {
// Handle parsing errors
System.err.println("Parse error at position " + e.position() + ": " + e.getMessage());
}
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionparse(InputStream inputStream) Parses XML from an InputStream with automatic encoding detection.parse(InputStream inputStream, String defaultEncoding) Parses XML from an InputStream with encoding detection and fallback.parse(InputStream inputStream, Charset defaultCharset) Parses XML from an InputStream with encoding detection and fallback.Parses an XML string into a lossless XML document tree.
-
Constructor Details
-
Parser
public Parser()Creates a new Parser instance with default settings.No initialization is needed here because the parser state (xml, position, length) is initialized at the start of each
parse(String)call.
-
-
Method Details
-
parse
Parses XML from an InputStream with automatic encoding detection.This method automatically detects the character encoding by:
- Checking for a Byte Order Mark (BOM)
- Reading the XML declaration to extract the encoding attribute
- Falling back to UTF-8 if no encoding is specified
The resulting Document will have its encoding property set to the detected or declared encoding.
- Parameters:
inputStream- the InputStream containing XML data- Returns:
- a Document containing the parsed XML with preserved formatting
- Throws:
DomTripException- if the XML is malformed, cannot be parsed, or I/O errors occur
-
parse
Parses XML from an InputStream with encoding detection and fallback.This method attempts to detect the character encoding by:
- Checking for a Byte Order Mark (BOM)
- Reading the XML declaration to extract the encoding attribute
- Using the provided default encoding if detection fails
The resulting Document will have its encoding property set to the detected, declared, or default encoding.
- Parameters:
inputStream- the InputStream containing XML datadefaultEncoding- the encoding name to use if detection fails- Returns:
- a Document containing the parsed XML with preserved formatting
- Throws:
DomTripException- if the XML is malformed, cannot be parsed, or I/O errors occur
-
parse
Parses XML from an InputStream with encoding detection and fallback.This method attempts to detect the character encoding by:
- Checking for a Byte Order Mark (BOM)
- Reading the XML declaration to extract the encoding attribute
- Using the provided default charset if detection fails
The resulting Document will have its encoding property set to the detected, declared, or default encoding.
- Parameters:
inputStream- the InputStream containing XML datadefaultCharset- the charset to use if detection fails- Returns:
- a Document containing the parsed XML with preserved formatting
- Throws:
DomTripException- if the XML is malformed, cannot be parsed, or I/O errors occur
-
parse
Parses an XML string into a lossless XML document tree.This method performs complete XML parsing while preserving all formatting information including whitespace, comments, attribute styles, and entity encoding. The resulting Document can be used for lossless round-trip editing.
- Parameters:
xml- the XML string to parse- Returns:
- a Document containing the parsed XML with preserved formatting
- Throws:
DomTripException- if the XML is malformed or cannot be parsed
-