Categorizing Words and Category Values-第3页回答

We were set an algorithm problem in class today, as a "if you figure out a solution you don't have to do this subject". SO of course, we all thought we will give it a go.

Basically, we were provided a DB of 100 words and 10 categories. There is no match between either the words or the categories. So its basically a list of 100 words, and 10 categories.

We have to "place" the words into the correct category - that is, we have to "figure out" how to put the words into the correct category. Thus, we must "understand" the word, and then put it in the most appropriate category algorthmically.

i.e. one of the words is "fishing" the category "sport" --> so this would go into this category. There is some overlap between words and categories such that some words could go into more than one category.

If we figure it out, we have to increase the sample size and the person with the "best" matching % wins.

Does anyone have ANY idea how to start something like this? Or any resources ? Preferably in C#?

Even a keyword DB or something might be helpful ? Anyone know of any free ones?

标签： algorithm classification

21条回答

兄弟一词,经得起流年.

2楼-- · 2019-03-09 03:11

My naive approach:

Create a huge text file like this (read the article for inspiration)
For every word, scan the text and whenever you match that word, count the 'categories' that appear in N (maximum, aka radio) positions left and right of it.
The word is likely to belong in the category with the greatest counter.

0人赞添加讨论(0) 举报

放荡不羁爱自由

3楼-- · 2019-03-09 03:13

Use (either online, or download) WordNet, and find the number of relationships you have to follow between words and each category.

0人赞添加讨论(0) 举报

The star\"

4楼-- · 2019-03-09 03:15

First of all you need sample text to analyze, to get the relationship of words. A categorization with latent semantic analysis is described in Latent Semantic Analysis approaches to categorization.

A different approach would be naive bayes text categorization. Sample text with the assigned category are needed. In a learning step the program learns the different categories and the likelihood that a word occurs in a text assigned to a category, see bayes spam filtering. I don't know how well that works with single words.

0人赞添加讨论(0) 举报

一夜七次

5楼-- · 2019-03-09 03:16

I am assuming that the problem allows using external data, because otherwise I cannot conceive of a way to deduce the meaning from words algorithmically.

Maybe something could be done with a thesaurus database, and looking for minimal distances between 'word' words and 'category' words?

0人赞添加讨论(0) 举报

Juvenile、少年°

6楼-- · 2019-03-09 03:17

Google is forbidden, but they have almost a perfect solution - Google Sets.

Because you need to unterstand the semantics of the words you need external datasources. You could try using WordNet. Or you could maybe try using Wikipedia - find the page for every word (or maybe only for the categories) and look for other words appearing on the page or linked pages.

0人赞添加讨论(0) 举报

Juvenile、少年°

7楼-- · 2019-03-09 03:18

You might be able to put use the WordNet database, create some metric to determine how closely linked two words (the word and the category) are and then choose the best category to put the word in.

0人赞添加讨论(0) 举报

Categorizing Words and Category Values

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间