4
<div>divbox</div>
<p>para1</p>
<p>para2</p>
<p>para3</p>
<table class="table"><tr><td></td></tr></table>
<p>para4</p>
<p>para5</p>

could someone please tell me how i can parse this html page to display ONLY para1, para2 and para3? and remove everything else.

condition:
i want to fetch all the content from the first <p> to the first <table class="table">.

(the first table will always have the class "table")

output:

<p>para1</p>
<p>para2</p>
<p>para3</p>
1

1 Answer 1

7
$d = new domdocument();
libxml_use_internal_errors(true);
$d->loadHTML($file);

foreach ($d->getElementsByTagName("*") as $el) {
    if ($el->tagName == "p")
        echo $el->textContent, "\n";
    elseif ($el->tagName == "table")
        break;
}

This gives:

para1
para2
para3