Update a tag name along with its value

2019-09-17 20:11发布

问题:

I am trying to replace html tags with updated values. I had tried using JSOUP but could not work out a way yet.

The functionality:

if (webText.contains("a href")) {
            // Parse it into jsoup
                        Document doc = Jsoup.parse(webText);
                        // Create an array to tackle every type individually as wrap can
                        // affect whole body types otherwises.
                        Element[] array = new Element[doc.select("a").size()];

                        for (int i = 0; i < doc.select("a").size(); i++) {
                            if (doc.select("a").get(i) != null) {
                                array[i] = doc.select("a").get(i);
                            }
                        }

                        for (int i = 0; i < array.length; i++) {
                            if (array[i].toString().contains("http")) {
                                Log.e("Link", array[i].toString());
                                Pattern p = Pattern.compile("href=\"(.*?)\"");
                                Matcher m = p.matcher(array[i].toString());
                                String url = null;
                                if (m.find()) {
                                    url = m.group(1); // this variable should contain the link URL
                                    Log.e("Link Value", url);
                                    array[i] = array[i].wrap("<a href='"+url+"' class='link'></a>");
                                }
                            }
                            else {
                                Log.e("Favourite", array[i].toString());
                                Pattern p = Pattern.compile("href=\"(.*?)\"");
                                Matcher m = p.matcher(array[i].toString());
                                String url = null;
                                if (m.find()) {
                                    url = m.group(1); // this variable should contain the link URL
                                    Log.e("Favourite Value", url);
                                    array[i] = array[i].wrap("<a href='"+url+"' class='favourite'></a>");
                                    //array[i] = array[i].replaceWithreplaceWith("","");
                                }
                            }

                        }

                        Element element = doc.body();
                        Log.e("From element html *************** ", " " + element.html());
                        String currentHtml = wrapImgWithCenter(element.html());
                        Log.e("currentHtml", currentHtml);
                        listOfElements = currentHtml;
        }

This array[i] = array[i].wrap("<a href='"+url+"' class='favourite'></a>"); is basically wrapping the existing tags with the new value. But I do not want that to happen. I want to replace the tags completely with something like:

"<a href='"+url+"' class='favourite'>+url+"</a>";

Input:

<html>
 <head></head>
 <body>
  <p dir="ltr"><a href="gYWMBi5XqN" class="favourite"></a><a href="gYWMBi5XqN"><font color="#009a49">Frank Frank</font></a> <a href="http://yahoo.co.in" class="link"></a><a href="http://yahoo.co.in"><font color="#0033cc">http://yahoo.co.in</font></a></p>
  <br />
  <br />
 </body>
</html>

Expected output:

<html>
 <head></head>
 <body>
  <p dir="ltr"><a href="gYWMBi5XqN" class="favourite"><font color="#009a49">Frank Frank</font></a> <a href="http://yahoo.co.in" class="link"><font color="#0033cc">http://yahoo.co.in</font></a></p>
  <br />
  <br />
 </body>
</html>

I have tried using replaceWith but was unsuccessful. You can still find it commented out in the source code provided above. Please tell me where am I going wrong? What should I do to update the tags?

P.S.: The input might be variable with some more or less tags.

回答1:

You can use the replaceWith method of class Element. I've cleared your code a little bit. Removed the arrays and used the provided lists wherever possible. Moreover you don't need regex to get the href attribute (or any other attribute for that matter) when you've already parsed the html. Check it out and inform me if you need further assistance.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Attributes;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.parser.Tag;
import org.jsoup.select.Elements;

public class Main {

    public static void main(String[] args) throws Exception {

        String webText = 
                "<html>" + 
                        "<head></head>" + 
                        "<body>" +
                            "<p dir=\"ltr\">" +
                                "<a href=\"gYWMBi5XqN\" class=\"favourite\"></a>" +
                                "<a href=\"gYWMBi5XqN\"><font color=\"#009a49\">Frank Frank</font></a>" +
                                "<a href=\"http://yahoo.co.in\" class=\"link\"></a>" +
                                "<a href=\"http://yahoo.co.in\"><font color=\"#0033cc\">http://yahoo.co.in</font></a>" + 
                            "</p>" + 
                        "</body>" + 
                    "</html>";

        if (webText.contains("a href")) {
            // Parse it into jsoup
            Document doc = Jsoup.parse(webText);

            Elements links = doc.select("a");

            for (Element link : links) {
                if (link.attr("href").contains("http")) {
                    System.out.println("Link: " + link.toString());
                    String url = link.attr("href");
                    if (url != null) {
                        System.out.println("Link Value: " + url);
                        Attributes attributes = new Attributes();
                        attributes.put("href", url);
                        attributes.put("class", "link");
                        link.replaceWith(new Element(Tag.valueOf("a"), "", attributes).insertChildren(0, link.childNodes()));       
                    }
                } else {
                    System.out.println("Favourite: " + link.toString());
                    String url = link.attr("href");
                    if (url != null) {
                        System.out.println("Favourite Value: " + url);
                        Attributes attributes = new Attributes();
                        attributes.put("href", url);
                        attributes.put("class", "favourite");
                        link.replaceWith(new Element(Tag.valueOf("a"), "", attributes).insertChildren(0, link.childNodes()));      
                    }
                }
            }

            Element element = doc.body();
            System.out.println("From element html *************** "+ element.html());
        }
    }
}

Input

<p dir="ltr">
    <a href="gYWMBi5XqN" class="favourite"></a>
    <a href="gYWMBi5XqN"><font color="#009a49">Frank Frank</font></a> 
    <a href="http://yahoo.co.in" class="link"></a>
    <a href="http://yahoo.co.in"><font color="#0033cc">http://yahoo.co.in</font></a>
</p>

Output

<p dir="ltr">
    <a href="gYWMBi5XqN" class="favourite"></a>
    <a href="gYWMBi5XqN" class="favourite"><font color="#009a49">Frank Frank</font></a> 
    <a href="http://yahoo.co.in" class="link"></a>
    <a href="http://yahoo.co.in" class="link"><font color="#0033cc">http://yahoo.co.in</font></a>
</p>

Input

<p dir="ltr">
    <a href="gYWMBi5XqN"><font color="#009a49">Frank Frank</font></a> 
    <a href="http://yahoo.co.in"><font color="#0033cc">http://yahoo.co.in</font></a>
</p>

Output

<p dir="ltr">
    <a href="gYWMBi5XqN" class="favourite"><font color="#009a49">Frank Frank</font></a>
    <a href="http://yahoo.co.in" class="link"><font color="#0033cc">http://yahoo.co.in</font></a>
</p>