Class Dcsoup
The core public access point to the Dcsoup functionality.
Inheritance
Namespace: Supremes
Assembly: Supremes.dll
Syntax
public static class Dcsoup : object
Methods
Clean(String, Whitelist)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
Declaration
public static string Clean(string bodyHtml, Whitelist whitelist)
Parameters
Type | Name | Description |
---|---|---|
System.String | bodyHtml | input untrusted HTML (body fragment) |
Whitelist | whitelist | white-list of permitted HTML elements |
Returns
Type | Description |
---|---|
System.String | safe HTML (body fragment) |
See Also
Clean(String, String, Whitelist)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
Declaration
public static string Clean(string bodyHtml, string baseUri, Whitelist whitelist)
Parameters
Type | Name | Description |
---|---|---|
System.String | bodyHtml | input untrusted HTML (body fragment) |
System.String | baseUri | URL to resolve relative URLs against |
Whitelist | whitelist | white-list of permitted HTML elements |
Returns
Type | Description |
---|---|
System.String | safe HTML (body fragment) |
See Also
Clean(String, String, Whitelist, DocumentOutputSettings)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
Declaration
public static string Clean(string bodyHtml, string baseUri, Whitelist whitelist, DocumentOutputSettings outputSettings)
Parameters
Type | Name | Description |
---|---|---|
System.String | bodyHtml | input untrusted HTML (body fragment) |
System.String | baseUri | URL to resolve relative URLs against |
Whitelist | whitelist | white-list of permitted HTML elements |
DocumentOutputSettings | outputSettings | document output settings; use to control pretty-printing and entity escape modes |
Returns
Type | Description |
---|---|
System.String | safe HTML (body fragment) |
See Also
IsValid(String, Whitelist)
Test if the input HTML has only tags and attributes allowed by the Whitelist.
Declaration
public static bool IsValid(string bodyHtml, Whitelist whitelist)
Parameters
Type | Name | Description |
---|---|---|
System.String | bodyHtml | HTML to test |
Whitelist | whitelist | whitelist to test against |
Returns
Type | Description |
---|---|
System.Boolean | true if no tags or attributes were removed; false otherwise |
Remarks
Useful for form validation. The input HTML should still be run through the cleaner to set up enforced attributes, and to tidy the output.
See Also
Parse(HttpResponseMessage)
Parse HTML into a Document.
Declaration
public static Document Parse(this HttpResponseMessage self)
Parameters
Type | Name | Description |
---|---|---|
HttpResponseMessage | self | The input |
Returns
Type | Description |
---|---|
Document | sane HTML document |
Parse(Stream, String, String)
Read an input stream, and parse it to a Document.
Declaration
public static Document Parse(Stream in, string charsetName, string baseUri)
Parameters
Type | Name | Description |
---|---|---|
Stream | in | input stream to read. Make sure to close it after parsing. |
System.String | charsetName | (optional) character set of file contents. Set to
|
System.String | baseUri | The URL where the HTML was retrieved from, to resolve relative links against. |
Returns
Type | Description |
---|---|
Document | sane HTML |
Parse(Stream, String, String, Parser)
Read an input stream, and parse it to a Document.
Declaration
public static Document Parse(Stream in, string charsetName, string baseUri, Parser parser)
Parameters
Type | Name | Description |
---|---|---|
Stream | in | input stream to read. Make sure to close it after parsing. |
System.String | charsetName | (optional) character set of file contents. Set to
|
System.String | baseUri | The URL where the HTML was retrieved from, to resolve relative links against. |
Parser | parser | alternate XmlParser to use. |
Returns
Type | Description |
---|---|
Document | sane HTML |
Remarks
You can provide an alternate parser, such as a simple XML (non-HTML) parser.
Parse(String)
Parse HTML into a Document.
Declaration
public static Document Parse(string html)
Parameters
Type | Name | Description |
---|---|---|
System.String | html | HTML to parse |
Returns
Type | Description |
---|---|
Document | sane HTML |
Remarks
As no base URI is specified, absolute URL detection relies on the HTML including a
<base href>
tag.
See Also
Parse(String, String)
Parse HTML into a Document.
Declaration
public static Document Parse(string html, string baseUri)
Parameters
Type | Name | Description |
---|---|---|
System.String | html | HTML to parse |
System.String | baseUri | The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur
before the HTML declares a
|
Returns
Type | Description |
---|---|
Document | sane HTML |
Remarks
The parser will make a sensible, balanced document tree out of any HTML.
Parse(String, String, Parser)
Parse HTML into a Document, using the provided Parser.
Declaration
public static Document Parse(string html, string baseUri, Parser parser)
Parameters
Type | Name | Description |
---|---|---|
System.String | html | HTML to parse |
System.String | baseUri | The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur
before the HTML declares a
|
Parser | parser | alternate XmlParser to use. |
Returns
Type | Description |
---|---|
Document | sane HTML |
Remarks
You can provide an alternate parser, such as a simple XML (non-HTML) parser.
Parse(Uri, Int32)
Fetch a URL, and parse it as HTML.
Declaration
public static Document Parse(Uri url, int timeoutMillis)
Parameters
Type | Name | Description |
---|---|---|
Uri | url | URL to fetch (with a GET). The protocol must be
|
System.Int32 | timeoutMillis | Connection and read timeout, in milliseconds. If exceeded, IOException is thrown. |
Returns
Type | Description |
---|---|
Document | The parsed HTML. |
Remarks
Provided for compatibility.
The encoding character set is determined by the content-type header or http-equiv meta tag, or falls back to
UTF-8
.
ParseBodyFragment(String)
Parse a fragment of HTML, with the assumption that it forms the
body
of the HTML.
Declaration
public static Document ParseBodyFragment(string bodyHtml)
Parameters
Type | Name | Description |
---|---|---|
System.String | bodyHtml | body HTML fragment |
Returns
Type | Description |
---|---|
Document | sane HTML document |
See Also
ParseBodyFragment(String, String)
Parse a fragment of HTML, with the assumption that it forms the
body
of the HTML.
Declaration
public static Document ParseBodyFragment(string bodyHtml, string baseUri)
Parameters
Type | Name | Description |
---|---|---|
System.String | bodyHtml | body HTML fragment |
System.String | baseUri | URL to resolve relative URLs against. |
Returns
Type | Description |
---|---|
Document | sane HTML document |
See Also
ParseFile(String, String)
Parse the contents of a file as HTML.
Declaration
public static Document ParseFile(string in, string charsetName)
Parameters
Type | Name | Description |
---|---|---|
System.String | in | file to load HTML from |
System.String | charsetName | (optional) character set of file contents. Set to
|
Returns
Type | Description |
---|---|
Document | sane HTML |
Remarks
The location of the file is used as the base URI to qualify relative URLs.
See Also
ParseFile(String, String, String)
Parse the contents of a file as HTML.
Declaration
public static Document ParseFile(string in, string charsetName, string baseUri)
Parameters
Type | Name | Description |
---|---|---|
System.String | in | file to load HTML from |
System.String | charsetName | (optional) character set of file contents. Set to
|
System.String | baseUri | The URL where the HTML was retrieved from, to resolve relative links against. |
Returns
Type | Description |
---|---|
Document | sane HTML |