Class Parser
Parses HTML into a Document . Generally best to use one of the more convenient parse methods in Dcsoup .
Inheritance
Namespace: Supremes.Parsers
Assembly: Supremes.dll
Syntax
public class Parser : object
Properties
CanTrackErrors
Check if parse error tracking is enabled.
Declaration
public bool CanTrackErrors { get; }
Property Value
Type | Description |
---|---|
System.Boolean | current track error state. |
Errors
Retrieve the parse errors, if any, from the last parse.
Declaration
public IList<ParseError> Errors { get; }
Property Value
Type | Description |
---|---|
IList<ParseError> | list of parse errors, up to the size of the maximum errors tracked. |
HtmlParser
Create a new HTML parser.
Declaration
public static Parser HtmlParser { get; }
Property Value
Type | Description |
---|---|
Parser | a new HTML parser. |
Remarks
This parser treats input as HTML5, and enforces the creation of a normalised document, based on a knowledge of the semantics of the incoming tags.
XmlParser
Create a new XML parser.
Declaration
public static Parser XmlParser { get; }
Property Value
Type | Description |
---|---|
Parser | a new simple XML parser. |
Remarks
This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.
Methods
Parse(String, String)
Parse HTML into a Document.
Declaration
public static Document Parse(string html, string baseUri)
Parameters
Type | Name | Description |
---|---|---|
System.String | html | HTML to parse |
System.String | baseUri | base URI of document (i.e. original fetch location), for resolving relative URLs. |
Returns
Type | Description |
---|---|
Document | parsed Document |
ParseBodyFragment(String, String)
Parse a fragment of HTML into the
body
of a Document.
Declaration
public static Document ParseBodyFragment(string bodyHtml, string baseUri)
Parameters
Type | Name | Description |
---|---|---|
System.String | bodyHtml | fragment of HTML |
System.String | baseUri | base URI of document (i.e. original fetch location), for resolving relative URLs. |
Returns
Type | Description |
---|---|
Document | Document, with empty head, and HTML parsed into body |
ParseFragment(String, Element, String)
Parse a fragment of HTML into a list of nodes.
Declaration
public static IReadOnlyList<Node> ParseFragment(string fragmentHtml, Element context, string baseUri)
Parameters
Type | Name | Description |
---|---|---|
System.String | fragmentHtml | the fragment of HTML to parse |
Element | context | (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation). |
System.String | baseUri | base URI of document (i.e. original fetch location), for resolving relative URLs. |
Returns
Type | Description |
---|---|
IReadOnlyList<Node> | list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified. |
Remarks
The context element, if supplied, supplies parsing context.
ParseInput(String, String)
Parse HTML into a Document
Declaration
public Document ParseInput(string html, string baseUri)
Parameters
Type | Name | Description |
---|---|---|
System.String | html | |
System.String | baseUri |
Returns
Type | Description |
---|---|
Document |
ParseXmlFragment(String, String)
Parse a fragment of XML into a list of nodes.
Declaration
public static IReadOnlyList<Node> ParseXmlFragment(string fragmentXml, string baseUri)
Parameters
Type | Name | Description |
---|---|---|
System.String | fragmentXml | the fragment of XML to parse |
System.String | baseUri | base URI of document (i.e. original fetch location), for resolving relative URLs. |
Returns
Type | Description |
---|---|
IReadOnlyList<Node> | list of nodes parsed from the input XML. |
SetTrackErrors(Int32)
Enable or disable parse error tracking for the next parse.
Declaration
public Parser SetTrackErrors(int maxErrors)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | maxErrors | the maximum number of errors to track. Set to 0 to disable. |
Returns
Type | Description |
---|---|
Parser | this, for chaining |
UnescapeEntities(String, Boolean)
Utility method to unescape HTML entities from a string
Declaration
public static string UnescapeEntities(string string, bool inAttribute)
Parameters
Type | Name | Description |
---|---|---|
System.String | string | HTML escaped string |
System.Boolean | inAttribute | if the string is to be escaped in strict mode (as attributes are) |
Returns
Type | Description |
---|---|
System.String | an unescaped string |