how can I exctract attribute value using JAVA rege

2020-04-16 18:10发布

I have such string:

<a href="https://host-test.com/create?userName=test3&amp;user-mail=myemail@gmail.com&amp;id=14b72820-3855-4f2b-9a39-543ced6784a0&amp;downloadurl=https://host-test.com:443/123/rest/tmp-z7vvymo3wmfzke/vfs/v2/downloadzip/&amp;projectid=d29ya3NwYWNleXFpYXlwZjgwb2sxNDA2MjovY3JlYXRlQWNj:createAcc;" style="font-family:Myriad Pro,arial,tahoma,serif;color:#fff;font-size:14px;text-decoration:none;font-weight:bold" title="Confirm tenant creation" target="_blank">
                            <div style="font-family:'Lucida Grande',sans-serif;border-radius:5px;width:120px;min-height:40px;line-height:40px;border:1px solid #577e15;color:#fff;text-align:center;background:#e77431;margin:15px 0 15px">
                                Confirm
                            </div>
                        </a>

and I need extract using regexp only href value:

https://host-test.com/create?userName=test3&amp;user-mail=myemail@gmail.com&amp;id=14b72820-3855-4f2b-9a39-543ced6784a0&amp;downloadurl=https://host-test.com:443/123/rest/tmp-z7vvymo3wmfzke/vfs/v2/downloadzip/&amp;projectid=d29ya3NwYWNleXFpYXlwZjgwb2sxNDA2MjovY3JlYXRlQWNj:createAcc;

also href value each time can be different shorter or longer

2条回答
贪生不怕死
2楼-- · 2020-04-16 18:59

For this particular string you can try something like

Pattern pattern = Pattern.compile("<a\\shref=\"([^\"]+)");
//or if you cant use group numbers use look-behind mechanism like
//Pattern.compile("(?<=<a\\shref=\")[^\"]+");
Matcher matcher = pattern.matcher(yourString);
if (matcher.find())
    System.out.println(matcher.group(1));

but if your string can change (like href atrubute can have other atributes before it) it can not work as expected. That is one of the reasons to use parsers rather then regex.

查看更多
Deceive 欺骗
3楼-- · 2020-04-16 19:05
myString.replaceFirst(myString, "^<a\\s+href\\s*=\\s*\"([^\"]+)\".*", , "$1");

assuming myString contains your string with the a element.

As the href attributes cannot be nested, this should be fine and no full HTML parser is needed. A restriction is that it will only find href attributes in double quotes.

查看更多
登录 后发表回答