how can I exctract attribute value using JAVA rege

2020-04-16 18:16发布

问题:

I have such string:

<a href="https://host-test.com/create?userName=test3&amp;user-mail=myemail@gmail.com&amp;id=14b72820-3855-4f2b-9a39-543ced6784a0&amp;downloadurl=https://host-test.com:443/123/rest/tmp-z7vvymo3wmfzke/vfs/v2/downloadzip/&amp;projectid=d29ya3NwYWNleXFpYXlwZjgwb2sxNDA2MjovY3JlYXRlQWNj:createAcc;" style="font-family:Myriad Pro,arial,tahoma,serif;color:#fff;font-size:14px;text-decoration:none;font-weight:bold" title="Confirm tenant creation" target="_blank">
                            <div style="font-family:'Lucida Grande',sans-serif;border-radius:5px;width:120px;min-height:40px;line-height:40px;border:1px solid #577e15;color:#fff;text-align:center;background:#e77431;margin:15px 0 15px">
                                Confirm
                            </div>
                        </a>

and I need extract using regexp only href value:

https://host-test.com/create?userName=test3&amp;user-mail=myemail@gmail.com&amp;id=14b72820-3855-4f2b-9a39-543ced6784a0&amp;downloadurl=https://host-test.com:443/123/rest/tmp-z7vvymo3wmfzke/vfs/v2/downloadzip/&amp;projectid=d29ya3NwYWNleXFpYXlwZjgwb2sxNDA2MjovY3JlYXRlQWNj:createAcc;

also href value each time can be different shorter or longer

回答1:

myString.replaceFirst(myString, "^<a\\s+href\\s*=\\s*\"([^\"]+)\".*", , "$1");

assuming myString contains your string with the a element.

As the href attributes cannot be nested, this should be fine and no full HTML parser is needed. A restriction is that it will only find href attributes in double quotes.



回答2:

For this particular string you can try something like

Pattern pattern = Pattern.compile("<a\\shref=\"([^\"]+)");
//or if you cant use group numbers use look-behind mechanism like
//Pattern.compile("(?<=<a\\shref=\")[^\"]+");
Matcher matcher = pattern.matcher(yourString);
if (matcher.find())
    System.out.println(matcher.group(1));

but if your string can change (like href atrubute can have other atributes before it) it can not work as expected. That is one of the reasons to use parsers rather then regex.