可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Hi I am trying to do a sentiment analysis using Naive Bayes classifier in python 2.x. It reads the sentiment using a txt file and then gives output as positive or negative based on the sample txt file sentiments. I want the output the same form as input e.g. I have a text file of lets sat 1000 raw sentiments and I want the output to show positive or negative against each sentiment. Please help. Below is the code i am using

import math
import string

def Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, test_string):
    y_values = [0,1]
    prob_values = [None, None]

    for y_value in y_values:
        posterior_prob = 1.0

        for word in test_string.split():
            word = word.lower().translate(None,string.punctuation).strip()
            if y_value == 0:
                if word not in negative:
                    posterior_prob *= 0.0
                else:
                    posterior_prob *= negative[word]
            else:
                if word not in positive:
                    posterior_prob *= 0.0
                else:
                    posterior_prob *= positive[word]

        if y_value == 0:
            prob_values[y_value] = posterior_prob * float(total_negative) / (total_negative + total_positive)
        else:
            prob_values[y_value] = posterior_prob * float(total_positive) / (total_negative + total_positive)

    total_prob_values = 0
    for i in prob_values:
        total_prob_values += i

    for i in range(0,len(prob_values)):
        prob_values[i] = float(prob_values[i]) / total_prob_values

    print prob_values

    if prob_values[0] > prob_values[1]:
        return 0
    else:
        return 1


if __name__ == '__main__':
    sentiment = open(r'C:/Users/documents/sample.txt')

    #Preprocessing of training set
    vocabulary = {}
    positive = {}
    negative = {}
    training_set = []
    TOTAL_WORDS = 0
    total_negative = 0
    total_positive = 0

    for line in sentiment:
        words = line.split()
        y = words[-1].strip()
        y = int(y)

        if y == 0:
            total_negative += 1
        else:
            total_positive += 1

        for word in words:
            word = word.lower().translate(None,string.punctuation).strip()
            if word not in vocabulary and word.isdigit() is False:
                vocabulary[word] = 1
                TOTAL_WORDS += 1
            elif word in vocabulary:
                vocabulary[word] += 1
                TOTAL_WORDS += 1

            #Training
            if y == 0:
                if word not in negative:
                    negative[word] = 1
                else:
                    negative[word] += 1
            else:
                if word not in positive:
                    positive[word] = 1
                else:
                    positive[word] += 1

    for word in vocabulary.keys():
        vocabulary[word] = float(vocabulary[word])/TOTAL_WORDS

    for word in positive.keys():
        positive[word] = float(positive[word])/total_positive

    for word in negative.keys():
        negative[word] = float(negative[word])/total_negative

    test_string = raw_input("Enter the review: \n")

    classifier = Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, test_string)
    if classifier == 0:
        print "Negative review"
    else:
        print "Positive review"

回答1:

I've checked the github repo posted by you in comments. I tried to run the project, but I have some errors.

Anyway, I've checked the project structure and the file used to training the naive bayes algorithm, and I think that the following piece of code can be used to write your result data in a Excel file (i.e. .xls)

with open("test11.txt") as f:
    for line in f:
        classifier = naive_bayes_classifier(positive, negative, total_negative, total_positive, line)
        result = 'Positive' if classifier == 0 else 'Negative'
        data_to_be_written += ([line, result],)

# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('test.xls')
worksheet = workbook.add_worksheet()

# Start from the first cell. Rows and columns are zero indexed.
row = 0
col = 0

# Iterate over the data and write it out row by row.
for item, cost in data_to_be_written:
   worksheet.write(row, col,     item)
worksheet.write(row, col + 1, cost)
row += 1

workbook.close()

Sorthly, for each row of the file with the sentences to be tested, I call the classifier and prepare a structure that will be written in the csv file.
Then loop the structure and write the xls file.
To do this I have used a python site package called xlsxwriter.

As I told you before, I have some problem to run the project, so this code is not tested as well. It should be works well, bu anyway, if you are in trouble, let me know.

Regards

回答2:

> with open("test11.txt") as f:
>     for line in f:
>         classifier = Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, line) if classifier == 0:
>       f.write(line + 'Negative') else:
>        f.write(line + 'Positive')
>     
> #        result = 'Positive' if classifier == 0 else 'Negative'
> #        data_to_be_written += ([line, result],)
> 
> # Create a workbook and add a worksheet. workbook = xlsxwriter.Workbook('test.xls') worksheet = workbook.add_worksheet()
> 
> # Start from the first cell. Rows and columns are zero indexed. row = 0 col = 0
> 
> # Iterate over the data and write it out row by row. for item, cost in f:    worksheet.write(row, col,     item) worksheet.write(row, col +
> 1, cost) row += 1
> 
> workbook.close()