How to impliment a Part-of-Speech (POS) tagger

I'm looking for the best PHP-based way to scan a lot of text entries (classifieds) and pull out keywords - anyone know about Part-of-Speech tagging? Is there a PHP-ish way to do this?

I scan a lot of online classifieds - but none with categories! To speed up the categorization process, I'm looking to install a Part-of-Speech tagger (http://en.wikipedia.org/wiki/Part-of-speech_tagging). Basically, these are cool text-parsing algorithmic software bundles that can tell me what words are nouns (like "Apartment", "Car", "Dog", etc) and what words are junk like at,if,and,but,etc. BUT...

There are online tagging services - one by Yahoo, which seems to be getting less love these days - another by XEROX. However, I'm really interested in installing my own library/software and plugging it into my web app.

DOES ANYONE know of a good way to install POS tagging that works with a PHP web application? I'm dying to figure this out, so any info, advice, or other wisdom you have is really appreciated!

Here's a list of a LOT of different POS software: http://www-nlp.stanford.edu/links/statnlp.html#Taggers (Look under "POS Taggers")

Thanks for reading this!

标签： php parsing tags full-text-search tagging

2条回答

不美不萌又怎样

2楼-- · 2019-02-13 17:43

Ian Barber has implemented a Brill Tagger in PHP, which he presents on his PHP/ir site where he describes using it to analyse tweets.

0人赞添加讨论(0) 举报

疯言疯语

3楼-- · 2019-02-13 17:46

Yea i'm currently using the Brill tagger. It works to some extent, although I wish I could figure out how to contribute to its ruleset. It makes plenty of mistakes, but still provides about 85% accurate data. My only issue is that it is SLOW!

It gets it right where it counts, on words with double meaning - however, there are many conventions unaccounted for, such as contrasting conjunction clauses, for instance I might say something negative about somebody, but after the comma, say something that reverse the polarity to positive, or not. The computer can't see idioms.

0人赞添加讨论(0) 举报

How to impliment a Part-of-Speech (POS) tagger

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间