libxml htmlParseDocument ignoring htmlParseOption

2019-08-02 19:48发布

问题:

Looking for someone who uses libxml through an environment other than packaged with PHP to confirm the HTML_PARSE_NOWARNING flag is ignored. Warnings are still generated.

Source code from PHP, implementing libxml in C:

//one of these options is 64 or HTML_PARSE_NOWARNING
htmlCtxtUseOptions(ctxt, (int)options);

ctxt->vctxt.error = php_libxml_ctx_error;
ctxt->vctxt.warning = php_libxml_ctx_warning;
if (ctxt->sax != NULL) {
    ctxt->sax->error = php_libxml_ctx_error;
    ctxt->sax->warning = php_libxml_ctx_warning;
}
htmlParseDocument(ctxt); //this still produces warnings

回答1:

libxml2 does not ignore the HTML_PARSE_NOWARNING flag. Calling htmlCtxtUseOptions with HTML_PARSE_NOWARNING causes the warning handlers to be unregistered (set to NULL). But the PHP code then proceeds to install its own handlers unconditionally, rendering the flag useless. The PHP code should either add a check whether to install the handlers:

htmlCtxtUseOptions(ctxt, (int)options);

if (!(options & HTML_PARSE_NOERROR)) {
    ctxt->vctxt.error = php_libxml_ctx_error;
    if (ctxt->sax != NULL)
        ctxt->sax->error = php_libxml_ctx_error;
}
if (!(options & HTML_PARSE_NOWARNING)) {
    ctxt->vctxt.warning = php_libxml_ctx_warning;
    if (ctxt->sax != NULL)
        ctxt->sax->warning = php_libxml_ctx_warning;
}
htmlParseDocument(ctxt);

Or call htmlCtxtUseOptions after setting the handlers:

ctxt->vctxt.error = php_libxml_ctx_error;
ctxt->vctxt.warning = php_libxml_ctx_warning;
if (ctxt->sax != NULL) {
    ctxt->sax->error = php_libxml_ctx_error;
    ctxt->sax->warning = php_libxml_ctx_warning;
}

htmlCtxtUseOptions(ctxt, (int)options);
htmlParseDocument(ctxt);


标签: php libxml2