How to parse HTML and get CSS styles

2019-05-11 16:17发布

问题:

I need to parse HTML and find corresponding CSS styles. I can parse HTML and CSS separataly, but I can't combine them. For example, I have an XHTML page like this:

<html>
<head>
<title></title>
</head>
<body>
<div class="abc">Hello World</div>
</body>
</html>

I have to search for "hello world" and find its class name, and after that I need to find its style from an external CSS file. Answers using Java, JavaScript, and PHP are all okay.

回答1:

Use jsoup library in java which is a HTML Parser. You can see for example here
For example you can do something like this:

String html="<<your html content>>";
Document doc = Jsoup.parse(html);
Element ele=doc.getElementsContainingOwnText("Hello World").first.clone(); //get tag containing Hello world
HashSet<String>class=ele.classNames(); //gives you the classnames of element containing Hello world

You can explore the library further to fit your needs.



回答2:

Similiar question Can jQuery get all CSS styles associated with an element?. Maybe css optimizers can do what you want, take a look at unused-css.com its online tool but also lists other tools.



回答3:

As i understood you have chance to parse style sheet from external file and this makes your task easy to solve. First try to parse html file with jsoup which supports jquery like selector syntax that helps you parse complicated html files easier. then check this previous solution to parse css file. Im not going to full solution as i state with these libraries all task done internally and the only thing you should do is writing glue code to combine these two.



回答4:

Using Java java.util.regex

String s = "<body>...<div class=\"abc\">Hello World</div></body>";
    Pattern p = Pattern.compile("<div.+?class\\s*?=\\s*['\"]?([^ '\"]+).*?>Hello World</div>", Pattern.CASE_INSENSITIVE);    Matcher m = p.matcher(s);
if (m.find()) {
    System.out.println(m.group(1));
}

prints abc