PHP Simple HTML DOM Parser
PHP Simple HTML DOM Parser
In this tutorial I will show you a great way to parse a webpage with plain PHP, in the same way you do with jquery. The syntax is very native and natural, and you will feel like home using it but first let’s thank the author of this library S.C. Chen (me578022@gmail.com) who made this parser.
What is a parser?
From wikipedia:
A parser is a software component that takes input data (frequently text) and builds a data structure – often some kind of parse tree, abstract syntax tree or other hierarchical structure – giving a structural representation of the input, checking for correct syntax in the process.
What for ?
Parsers generally help you take content that is available on a website in a given format, and put it on another website/database/file in the same or different format.
Introducing the simple html dom parser
Available here.
- A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
- Require PHP 5+.
- Supports invalid HTML.
- Find tags on an HTML page with selectors just like jQuery.
- Extract contents from HTML in a single line.
1 2 3 4 5 6 7 8 9 10 |
// Create DOM from URL or file $html = file_get_html('http://www.google.com/'); // Find all images foreach($html->find('img') as $element) echo $element->src . '<br>'; // Find all links foreach($html->find('a') as $element) echo $element->href . '<br>'; |
It provides both the oop way and procedural way to query the dom. Here is something more advanced:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// Find all element which id=foo $ret = $html->find('#foo'); // Find all element which class=foo $ret = $html->find('.foo'); // Find all element has attribute id $ret = $html->find('*[id]'); // Find all anchors and images $ret = $html->find('a, img'); // Find all anchors and images with the "title" attribute $ret = $html->find('a[title], img[title]'); |
Summary
In this tutorial we saw a very simple yet powerful library for parsing the DOM with PHP since we learned in the past how to parse it with jQuery. Be carefull and remember when you parse content. Do not parse and website if it has a copyright present.