Raw IRC output nick and message parsing via Regex

2019-09-21 21:46发布

I am trying to parse the Twitch IRC chat into a more readable way. I have never used Regex and am not sure how to go about this (even after reading tons of tutorials.)

This is the raw output:

:nick!nick@nick.tmi.twitch.tv PRIVMSG channel :

I would like two regex's to parse the nick and message to be used individually, thanks!

标签: c# regex irc
1条回答
可以哭但决不认输i
2楼-- · 2019-09-21 22:23

Regex is not your solution for this problem. If you really want to go down this road (but don't - keep reading!), then you can use something like this for the entire message:

:(?<nick>[^ ]+?)\!(?<user>[^ ]+?)@(?<host>[^ ]+?) PRIVMSG (?<target>[^ ]+?) :(?<message>.*)

There's capture groups defined on the nick, username, hostname, channel, and message. I've not tested that, and it'll fail miserably on pretty much every other IRC event, and there will be ways to break it or get around the matching as it's the wrong sort of grammar tool for IRC: it's like hammering in nails with a screwdriver - while it works some of the time, it's harder than it needs to be, and can be made to work better with a lot of time, effort, and pain; why would you not use a hammer?

A much better solution is to simply parse the message. The IRC specs in RFC1459 and RFC2812 give some pretty useful hints here. My advice from experience is to split on " :" (space then colon) - this is the last parameter of the message, then split the first half by spaces. If the first entry in your list starts with a space, split it again by ! and @ to get the parts of the nickname/username/hostname tuple. Follow this method, and you'll have the base to a much more robust and extensible parser than one you could ever build using regular expressions.

If you're doing this as a learning exercise, great! If not, you probably want to consider using a pre-built library to handle all the IRC communication for you.

查看更多
登录 后发表回答