Jmeter - regex in beanshell (matcher()/pattern() )

2019-05-14 21:43发布

i need to cut some words from server response data.

Use Regular Expression Extractor I get

<span class="snippet_word">Działalność</span> <span class="snippet_word">lecznicza</span>.</a>

from that i need just: "Działalność lecznicza"

so i write a program in Beanshell which should do that and there's a problem because i get

"lecznicza lecznicza"

Here is my program:

import java.util.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

String pattern = "\\w+(?=\\<)";
String co = vars.get("tresc");
int len  = Integer.parseInt(vars.get("length"));
String phrase="";
StringBuffer sb = new StringBuffer();

Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(co);

for(i=0; i < len ;i++){
if (m.find()){
strbuf = new StringBuffer(m.group(0));
} 
else {
phrase="notfound";
}

sb.append(" ");
sb.append(strbuf);
}

phrase = sb.toString();

return phrase;

tresc - is my source from I extract pattern word. Length - tells me how many words i'm extracting.

Program is working fine for phrase without national characters. Thats why I think there is some problem with encoding or somewhere here:

Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(co);

but i don't know how to change my code.

标签： java regex jmeter beanshell

1条回答

forever°为你锁心

2楼-- · 2019-05-14 22:18

\w does not match unicode. To match unicode in regex, you can use \p{L}:

String pattern = "\\p{L}+(?=\\<)";

Although for this type of work I would recommend using an XML parser as regular expressions are completely unsuitable for parsing HTML/XML as described in this post

0人赞添加讨论(0) 举报

Jmeter - regex in beanshell (matcher()/pattern() )

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间