mirror of
https://github.com/php/php-src.git
synced 2024-12-04 15:23:44 +08:00
d8eeb8e286
be easier to maintain.
123 lines
5.8 KiB
Plaintext
123 lines
5.8 KiB
Plaintext
|
|
README FOR ext/tidy by John Coggeshall <john@php.net>
|
|
|
|
Tidy Version: 0.7b
|
|
|
|
Tidy is an extension based on Libtidy (http://tidy.sf.net/) and allows a PHP developer
|
|
to clean, repair, and traverse HTML, XHTML, and XML documents -- including ones with
|
|
embedded scripting languages such as PHP or ASP within them using OO constructs.
|
|
|
|
---------------------------------------------------------------------------------------
|
|
!! Important Note !!
|
|
---------------------------------------------------------------------------------------
|
|
At this time libtidy has a small memory leak inside the ParseConfigFileEnc() function
|
|
used to load configuration from a file. If you intend to use this functionality apply
|
|
the "libtidy.txt" patch (cd tidy/src/; patch -p0 < libtidy.txt) to libtidy sources and
|
|
then recompile libtidy.
|
|
---------------------------------------------------------------------------------------
|
|
|
|
The Tidy extension has two separate APIs, one for general parsing, cleaning, and
|
|
repairing and another for document traversal. The general API is provided below:
|
|
|
|
tidy_create() Reinitialize the tidy engine
|
|
tidy_parse_file($file) Parse the document stored in $file
|
|
tidy_parse_string($str) Parse the string stored in $str
|
|
|
|
tidy_clean_repair() Clean and repair the document
|
|
tidy_diagnose() Diagnose a parsed document
|
|
|
|
tidy_setopt($opt, $val) Set a configuration option $opt to $val
|
|
tidy_getopt($opt) Retrieve a configuration option
|
|
|
|
** note: $opt is a string representing the option. Although no formal
|
|
documentation yet exists for PHP, you can find a description of many
|
|
of them at http://www.w3.org/People/Raggett/tidy/ and a list of supported
|
|
options in the phpinfo(); output**
|
|
|
|
tidy_get_output() Return the cleaned tidy HTML as a string
|
|
tidy_get_error_buffer() Return a log of the errors and warnings
|
|
returned by tidy
|
|
|
|
tidy_get_release() Return the Libtidy release date
|
|
tidy_get_status() Return the status of the document
|
|
tidy_get_html_ver() Return the major HTML version detected for
|
|
the document;
|
|
|
|
tidy_is_xhtml() Determines if the document is XHTML
|
|
tidy_is_xml() Determines if the document is a generic XML
|
|
|
|
tidy_error_count() Returns the number of errors in the document
|
|
tidy_warning_count() Returns the number of warnings in the document
|
|
tidy_access_count() Returns the number of accessibility-related
|
|
warnings in the document.
|
|
tidy_config_count() Returns the number of configuration errors found
|
|
|
|
tidy_load_config($file) Loads the specified configuration file
|
|
tidY_load_config_enc($file,
|
|
$enc) Loads the specified config file using the specified
|
|
character encoding
|
|
tidy_set_encoding($enc) Sets the current character encoding for the document
|
|
tidy_save_config($file) Saves the current config to $file
|
|
|
|
|
|
Beyond these general-purpose API functions, Tidy also supports the following
|
|
functions which are used to retrieve an object for document traversal:
|
|
|
|
tidy_get_root() Returns an object starting at the root of the
|
|
document
|
|
tidy_get_head() Returns an object starting at the <HEAD> tag
|
|
tidy_get_html() Returns an object starting at the <HTML> tag
|
|
tidy_get_body() Returns an object starting at the <BODY> tag
|
|
|
|
All Navigation of the specified document is done via the PHP5 object constructs.
|
|
There are two types of objects which Tidy can create. The first is TidyNode, which
|
|
represents HTML Tags, Text, and more (see the TidyNode_Type Constants). The second
|
|
is TidyAttr, which represents an attribute within an HTML tag (TidyNode). The
|
|
functionality of these objects is represented by the following schema:
|
|
|
|
class TidyNode {
|
|
|
|
public $name; // name of node (i.e. HEAD)
|
|
public $value; // value of node (everything between tags)
|
|
public $type; // type of node (text, php, asp, etc.)
|
|
public $id; // id of node (i.e. TIDY_TAG_HEAD)
|
|
|
|
public function attributes(); // an array of attributes (see TidyAttr)
|
|
public function children(); // an array of child nodes
|
|
|
|
function has_siblings(); // any sibling nodes?
|
|
function has_children(); // any child nodes?
|
|
|
|
function is_comment(); // is node a comment?
|
|
function is_xhtml(); // is document XHTML?
|
|
function is_xml(); // is document generic XML (not HTML/XHTML)
|
|
function is_text(); // is node text?
|
|
function is_html(); // is node an HTML tag?
|
|
|
|
function is_jste(); // is jste block?
|
|
function is_asp(); // is Microsoft ASP block?
|
|
function is_php(); // is PHP block?
|
|
|
|
function next(); // returns next node
|
|
function prev(); // returns prev node
|
|
|
|
/* Searches for a particular attribute in the current node based
|
|
on node ID. If found returns a TidyAttr object for it */
|
|
function get_attr($attr_id);
|
|
|
|
/*
|
|
}
|
|
|
|
class TidyAttr {
|
|
|
|
public $name; // attribute name i.e. HREF
|
|
public $value; // attribute value
|
|
public $id; // attribute id i.e. TIDY_ATTR_HREF
|
|
|
|
}
|
|
|
|
Examples of using these objects to navigate the tree can be found in the examples/
|
|
directory (I suggest looking at urlgrab.php and dumpit.php)
|
|
|
|
E-mail thoughts, suggestions, patches, etc. to <john@php.net>
|