How to extract usernames out of Tweets?

2019-04-06 19:14发布

I have the following example tweet:

RT @user1: who are @thing and @user2?

I only want to have user1, thing and user2.

What regular expression can I use to extract those three names?

PS: A username must only contain letters, numbers and underscores.

标签: regex twitter
5条回答
\"骚年 ilove
2楼-- · 2019-04-06 19:30

This should do it (I used named captures for convenience):

.+?@(?[a-zA-Z0-9_]+):[^@]+?@(?[^\s]+)[^@]+?@(?[a-zA-Z0-9_]+)

查看更多
不美不萌又怎样
3楼-- · 2019-04-06 19:35

Tested:

/@([a-z0-9_]+)/i

In Ruby (irb):

>> "RT @user1: who are @thing and @user2?".scan(/@([a-z0-9_]+)/i)
=> [["user1"], ["thing"], ["user2"]]

In Python:

>>> import re
>>> re.findall("@([a-z0-9_]+)", "RT @user1: who are @thing and @user2?", re.I)
['user1', 'thing', 'user2']

In PHP:

<?PHP
$matches = array();
preg_match_all(
    "/@([a-z0-9_]+)/i",
    "RT @user1: who are @thing and @user2?",
    $matches);

print_r($matches[1]);
?>

Array
(
    [0] => user1
    [1] => thing
    [2] => user2
)
查看更多
The star\"
4楼-- · 2019-04-06 19:36

Is a good idea include twitter text library [1] in your project to resolve this text issues.

twttr.txt.extractMentions("a very generic twitt with some @mention");

[1] https://github.com/twitter/twitter-text-js

查看更多
Root(大扎)
5楼-- · 2019-04-06 19:40
/(?<!\w)@(\w+)/

The above covers the following scenario, which other answers in this thread do not:

  • An @ sign that is not supposed to be a username, e.g. "my email is test@example.com"
  • Still allows a username that is at the beginning of a string, e.g. "@username lorem ipsum..."
查看更多
爷的心禁止访问
6楼-- · 2019-04-06 19:43

try an iterator (findall) with this regex:

(@[\w-]+)

bye

查看更多
登录 后发表回答