Related: How can I pretty-print JSON in (unix) shell script?
Is there a (unix) shell script to format XML in human-readable form?
Basically, I want it to transform the following:
<root><foo a="b">lorem</foo><bar value="ipsum" /></root>
... into something like this:
<root>
<foo a="b">lorem</foo>
<bar value="ipsum" />
</root>
libxml2-utils
This utility comes with
libxml2-utils
:Perl's
XML::Twig
This command comes with XML::Twig perl module, sometimes
xml-twig-tools
package:xmlstarlet
This command comes with
xmlstarlet
:tidy
Check the
tidy
package:Python
Python's
xml.dom.minidom
can format XML:saxon-lint
You need
saxon-lint
:saxon-HE
You need
saxon-HE
:You didn't mention a file, so I assume you want to provide the XML string as standard input on the command line. In that case, do the following:
xmllint support formatting in-place:
As Daniel Veillard has written:
Indent level is controlled by
XMLLINT_INDENT
environment variable which is by default 2 spaces. Example how to change indent to 4 spaces:You may have lack with
--recover
option when you XML documents are broken. Or try weak HTML parser with strict XML output:--nsclean
,--nonet
,--nocdata
,--noblanks
etc may be useful. Read man page.xmllint --format yourxmlfile.xml
xmllint is a command line XML tool and is included in
libxml2
(http://xmlsoft.org/).================================================
Note: If you don't have
libxml2
installed you can install it by doing the following:CentOS
Ubuntu
sudo apt-get install libxml2-utils
MacOS
To install this on MacOS with Homebrew just do:
brew install libxml2
Git
Also available on Git if you want the code:
git clone git://git.gnome.org/libxml2
You can also use tidy, which may need to be installed first (e.g. on Ubuntu: sudo
apt-get install tidy
).For this, you would issue something like following:
Note: has many additional readability flags, but word-wrap behavior is a bit annoying to untangle (http://tidy.sourceforge.net/docs/quickref.html).