Prevent adding first line when using htmlParse() f

2019-09-12 08:09发布

I have a problem while doing a htmlParse() on a XHTML document.

When it loads into R as an 'externalptr', I can see that one line is added, at the top of the file:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

I don't want to make this line appear because it breaks my application. I would like to delete it within the htmlParse() function, and not having to delete this line manually for each XHTML I have.

Any suggestions? I've tried changing some parameters passed to the function htmlParse() but at this time, after trying with it, I have not found it.

If it helps, here are the first lines of the XHTML I parse:

<?xml version="1.0" encoding="utf-8" ?>
<html dir="ltr" xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="es">
<head>
<meta charset="utf-8" />

1条回答
Anthone
2楼-- · 2019-09-12 08:58

I tried with xmlRoot() and then saved with saveXML(), including as parameters the prefix <?xml version="1.0" encoding="utf-8" ?>

There was also an encoding problem but that's another story. In Windows didn't work, in Ubuntu finally worked.

Thank you all.

查看更多
登录 后发表回答