Show / Hide Table of Contents

Class Parser

Parses HTML into a Document . Generally best to use one of the more convenient parse methods in Dcsoup .

Inheritance
System.Object
Parser
Namespace: Supremes.Parsers
Assembly: Supremes.dll
Syntax
public class Parser : object

Properties

CanTrackErrors

Check if parse error tracking is enabled.

Declaration
public bool CanTrackErrors { get; }
Property Value
Type Description
System.Boolean

current track error state.

Errors

Retrieve the parse errors, if any, from the last parse.

Declaration
public IList<ParseError> Errors { get; }
Property Value
Type Description
IList<ParseError>

list of parse errors, up to the size of the maximum errors tracked.

HtmlParser

Create a new HTML parser.

Declaration
public static Parser HtmlParser { get; }
Property Value
Type Description
Parser

a new HTML parser.

Remarks

This parser treats input as HTML5, and enforces the creation of a normalised document, based on a knowledge of the semantics of the incoming tags.

XmlParser

Create a new XML parser.

Declaration
public static Parser XmlParser { get; }
Property Value
Type Description
Parser

a new simple XML parser.

Remarks

This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.

Methods

Parse(String, String)

Parse HTML into a Document.

Declaration
public static Document Parse(string html, string baseUri)
Parameters
Type Name Description
System.String html

HTML to parse

System.String baseUri

base URI of document (i.e. original fetch location), for resolving relative URLs.

Returns
Type Description
Document

parsed Document

ParseBodyFragment(String, String)

Parse a fragment of HTML into the body of a Document.

Declaration
public static Document ParseBodyFragment(string bodyHtml, string baseUri)
Parameters
Type Name Description
System.String bodyHtml

fragment of HTML

System.String baseUri

base URI of document (i.e. original fetch location), for resolving relative URLs.

Returns
Type Description
Document

Document, with empty head, and HTML parsed into body

ParseFragment(String, Element, String)

Parse a fragment of HTML into a list of nodes.

Declaration
public static IReadOnlyList<Node> ParseFragment(string fragmentHtml, Element context, string baseUri)
Parameters
Type Name Description
System.String fragmentHtml

the fragment of HTML to parse

Element context

(optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation).

System.String baseUri

base URI of document (i.e. original fetch location), for resolving relative URLs.

Returns
Type Description
IReadOnlyList<Node>

list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.

Remarks

The context element, if supplied, supplies parsing context.

ParseInput(String, String)

Parse HTML into a Document

Declaration
public Document ParseInput(string html, string baseUri)
Parameters
Type Name Description
System.String html
System.String baseUri
Returns
Type Description
Document

ParseXmlFragment(String, String)

Parse a fragment of XML into a list of nodes.

Declaration
public static IReadOnlyList<Node> ParseXmlFragment(string fragmentXml, string baseUri)
Parameters
Type Name Description
System.String fragmentXml

the fragment of XML to parse

System.String baseUri

base URI of document (i.e. original fetch location), for resolving relative URLs.

Returns
Type Description
IReadOnlyList<Node>

list of nodes parsed from the input XML.

SetTrackErrors(Int32)

Enable or disable parse error tracking for the next parse.

Declaration
public Parser SetTrackErrors(int maxErrors)
Parameters
Type Name Description
System.Int32 maxErrors

the maximum number of errors to track. Set to 0 to disable.

Returns
Type Description
Parser

this, for chaining

UnescapeEntities(String, Boolean)

Utility method to unescape HTML entities from a string

Declaration
public static string UnescapeEntities(string string, bool inAttribute)
Parameters
Type Name Description
System.String string

HTML escaped string

System.Boolean inAttribute

if the string is to be escaped in strict mode (as attributes are)

Returns
Type Description
System.String

an unescaped string

Back to top Generated by DocFX