I am doing a structural analysis on web documents. For this i need to extract only the structure of a web document(only the tags). I found a html parser for java called Jsoup. But I don't know how to use it to extract tags.
Example:
<html>
<head>
this is head
</head>
<body>
this is body
</body>
</html>
Output:
html,head,head,body,body,html
Sound like a depth-first traversal:
another solution is to use jsoup NodeVisitor as follows:
class: