Show / Hide Table of Contents

Class Dcsoup

The core public access point to the Dcsoup functionality.

Inheritance
System.Object
Dcsoup
Namespace: Supremes
Assembly: Supremes.dll
Syntax
public static class Dcsoup : object

Methods

Clean(String, Whitelist)

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Declaration
public static string Clean(string bodyHtml, Whitelist whitelist)
Parameters
Type Name Description
System.String bodyHtml

input untrusted HTML (body fragment)

Whitelist whitelist

white-list of permitted HTML elements

Returns
Type Description
System.String

safe HTML (body fragment)

See Also
Clean(Document)

Clean(String, String, Whitelist)

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Declaration
public static string Clean(string bodyHtml, string baseUri, Whitelist whitelist)
Parameters
Type Name Description
System.String bodyHtml

input untrusted HTML (body fragment)

System.String baseUri

URL to resolve relative URLs against

Whitelist whitelist

white-list of permitted HTML elements

Returns
Type Description
System.String

safe HTML (body fragment)

See Also
Clean(Document)

Clean(String, String, Whitelist, DocumentOutputSettings)

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Declaration
public static string Clean(string bodyHtml, string baseUri, Whitelist whitelist, DocumentOutputSettings outputSettings)
Parameters
Type Name Description
System.String bodyHtml

input untrusted HTML (body fragment)

System.String baseUri

URL to resolve relative URLs against

Whitelist whitelist

white-list of permitted HTML elements

DocumentOutputSettings outputSettings

document output settings; use to control pretty-printing and entity escape modes

Returns
Type Description
System.String

safe HTML (body fragment)

See Also
Clean(Document)

IsValid(String, Whitelist)

Test if the input HTML has only tags and attributes allowed by the Whitelist.

Declaration
public static bool IsValid(string bodyHtml, Whitelist whitelist)
Parameters
Type Name Description
System.String bodyHtml

HTML to test

Whitelist whitelist

whitelist to test against

Returns
Type Description
System.Boolean

true if no tags or attributes were removed; false otherwise

Remarks

Useful for form validation. The input HTML should still be run through the cleaner to set up enforced attributes, and to tidy the output.

See Also
Clean(String, Whitelist)

Parse(HttpResponseMessage)

Parse HTML into a Document.

Declaration
public static Document Parse(this HttpResponseMessage self)
Parameters
Type Name Description
HttpResponseMessage self

The input , which acts as the this instance for the extension method.

Returns
Type Description
Document

sane HTML document

Parse(Stream, String, String)

Read an input stream, and parse it to a Document.

Declaration
public static Document Parse(Stream in, string charsetName, string baseUri)
Parameters
Type Name Description
Stream in

input stream to read. Make sure to close it after parsing.

System.String charsetName

(optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do).

System.String baseUri

The URL where the HTML was retrieved from, to resolve relative links against.

Returns
Type Description
Document

sane HTML

Parse(Stream, String, String, Parser)

Read an input stream, and parse it to a Document.

Declaration
public static Document Parse(Stream in, string charsetName, string baseUri, Parser parser)
Parameters
Type Name Description
Stream in

input stream to read. Make sure to close it after parsing.

System.String charsetName

(optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do).

System.String baseUri

The URL where the HTML was retrieved from, to resolve relative links against.

Parser parser

alternate XmlParser to use.

Returns
Type Description
Document

sane HTML

Remarks

You can provide an alternate parser, such as a simple XML (non-HTML) parser.

Parse(String)

Parse HTML into a Document.

Declaration
public static Document Parse(string html)
Parameters
Type Name Description
System.String html

HTML to parse

Returns
Type Description
Document

sane HTML

Remarks

As no base URI is specified, absolute URL detection relies on the HTML including a <base href> tag.

See Also
Parse(String, String)

Parse(String, String)

Parse HTML into a Document.

Declaration
public static Document Parse(string html, string baseUri)
Parameters
Type Name Description
System.String html

HTML to parse

System.String baseUri

The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag.

Returns
Type Description
Document

sane HTML

Remarks

The parser will make a sensible, balanced document tree out of any HTML.

Parse(String, String, Parser)

Parse HTML into a Document, using the provided Parser.

Declaration
public static Document Parse(string html, string baseUri, Parser parser)
Parameters
Type Name Description
System.String html

HTML to parse

System.String baseUri

The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag.

Parser parser

alternate XmlParser to use.

Returns
Type Description
Document

sane HTML

Remarks

You can provide an alternate parser, such as a simple XML (non-HTML) parser.

Parse(Uri, Int32)

Fetch a URL, and parse it as HTML.

Declaration
public static Document Parse(Uri url, int timeoutMillis)
Parameters
Type Name Description
Uri url

URL to fetch (with a GET). The protocol must be http or https .

System.Int32 timeoutMillis

Connection and read timeout, in milliseconds. If exceeded, IOException is thrown.

Returns
Type Description
Document

The parsed HTML.

Remarks

Provided for compatibility.

The encoding character set is determined by the content-type header or http-equiv meta tag, or falls back to UTF-8 .

ParseBodyFragment(String)

Parse a fragment of HTML, with the assumption that it forms the body of the HTML.

Declaration
public static Document ParseBodyFragment(string bodyHtml)
Parameters
Type Name Description
System.String bodyHtml

body HTML fragment

Returns
Type Description
Document

sane HTML document

See Also
Body

ParseBodyFragment(String, String)

Parse a fragment of HTML, with the assumption that it forms the body of the HTML.

Declaration
public static Document ParseBodyFragment(string bodyHtml, string baseUri)
Parameters
Type Name Description
System.String bodyHtml

body HTML fragment

System.String baseUri

URL to resolve relative URLs against.

Returns
Type Description
Document

sane HTML document

See Also
Body

ParseFile(String, String)

Parse the contents of a file as HTML.

Declaration
public static Document ParseFile(string in, string charsetName)
Parameters
Type Name Description
System.String in

file to load HTML from

System.String charsetName

(optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do).

Returns
Type Description
Document

sane HTML

Remarks

The location of the file is used as the base URI to qualify relative URLs.

See Also
ParseFile(String, String, String)

ParseFile(String, String, String)

Parse the contents of a file as HTML.

Declaration
public static Document ParseFile(string in, string charsetName, string baseUri)
Parameters
Type Name Description
System.String in

file to load HTML from

System.String charsetName

(optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do).

System.String baseUri

The URL where the HTML was retrieved from, to resolve relative links against.

Returns
Type Description
Document

sane HTML

Back to top Generated by DocFX