HTML DOM Manipulation with PHP

When discussing HTML DOM (Document Object Model) manipulation it is common to think JavaScript, but there are cases where you might want to do this on the server.

You, if you're like me, might think that HTML, or at least XHTML, is a job for simplexml. Loading and parsing an XHTML file is certainly possible with simplexml_load_file and you can easily use the various simplexml methods to maanipulate the DOM. You will however encounter a small problem if you need to output HTML. As simplexml is designed to deal with XML, it sees nothing wrong with outputting compacted empty tags. In other words if you give it an empty textarea, like so:

$xml = simplexml_load_string('<textarea></textarea>');

and a little while later try to get that string back again, you will find that it's taken some liberties:

echo $xml->asXML(); // <textarea />

This is perfectly fine XML but as HTML it breaks at least a few browsers. If you want to avoid this you'll need to start thinking up clever ways to circumvent what simplexml wants to do.

Or we can just not use simplexml.

The DOMDocument class is, perhaps not unsurprisingly, more feature-rich than the its simpler little brother. For example it has a a way to preserve opening and closing tags:

echo $dom->saveXML($dom, LIBXML_NOEMPTYTAG); // <textarea></textarea>

But even more usefully, it has a loadHTMLFile method and a saveHTML method, which treat HTML more like, well HTML.

Here's a quick example, we're looking to get the names of any required textareas:

$dom = new DomDocument();
$dom->loadHTMLFile($myHtmlFile);

$requireds = array();
$nodes = $dom->getElementsByTagName('textarea');
foreach ($nodes as $node) {
    if ($node->hasAttribute('required')) {
        array_push($requireds, $node->getAttribute('name'));
    }
}

permalink | Tags: php.

contact

tags

archive

more blogs