пятница, 20 января 2012 г.

Tidy and repair, clean or complete HTML tags

Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree.

Installation (only php5 package)
Windows
If you are using PHP5 on a Windows system, all you need to do to enable the extension is uncomment the line extension=php_tidy.dll in your php.ini file. The official win32 binary distribution has built-in Tidy support.
Debian Linux
If you are using PHP5 on Debian Linux system, all you need to do to enable the extension is run command in console terminal: 
~ # apt-get install php5-tidy


Example:
If we need to recover bad html code or we doubt that the user enters the correct code in one of the proposed fields, then we can always be insured by Tidy.

It's very simple, this is a bad html code:
  1. $html = "<html>
  2.       <head>
  3.       <title>text</title>
  4.       </head>
  5.       <body>
  6.  <p> bad code <br> second bad code </i>
  7.       </body>
  8.     </html>";
For repair it, all you need to do is:
  1. $config = array'indent'         => true,
  2.             'output-xhtml'   => true,
  3.             'wrap'           => 200);
  4. $Tidy = new tidy();
  5. $Tidy->parseString($html, $config, 'utf8');
  6. $Tidy->cleanRepair();
  7. echo "<pre>" . htmlspecialchars($Tidy) . "</pre>";
So you add some short configurations params (wrapper, output standart and indent), than create a new Tidy object, than parse your bad string with encoding param and finaly run method Tidy::cleanRepair(). 
We can see a results:
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  2.    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  3. <html xmlns="http://www.w3.org/1999/xhtml">
  4.   <head>
  5.     <title>
  6.       text  
  7.     </title>
  8.   </head>
  9.   <body>
  10.     <p>
  11.       bad code<br />
  12.       second bad code
  13.     </p>
  14.   </body>
  15. </html>
We see that tag </i> was deleted, tag <br /> was modified, tag <p> was closed correctly and also was added a DOCTYPE tag. And now the page looks more attractive.

All documentation about Tidy you can find on php.net in Tidy section.