可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

What should the format of the input dataset be for Google AutoML Natural Language multi-label text classification? I know that for multi-class classification I need a column of text and another column for labels. The labels column include one label per row.

I have multiple labels for each text and I want to do multi-label classification. I tried having one column per label and one-hot encoding but I got this error message: Max 1000 labels supported. Found 9823 labels.

回答1:

It was very confusing at first but later I managed to find the format in the documentation, which is a CSV file like:

text1, label1, label2 text2, label2 text3, label3, label2, label1

The parser doesn't understand a table with NULL cells saved as a standard CSV file, which is like:

text1, label1, label2, text2, label2,, text3, label3, label2, label1

I had to manually remove extra commas from the CSV file generated by Pandas.

回答2:

One column per label is the way to go. If you have less than 1000 labels, you probably have a mistake in your CSV file, where the parser is getting confused and thinks some of the tokens in the text of the example are labels. Please make sure that your text is correctly escaped with quotes around.

回答3:

Google AutoML has updated their parser. The following format is fine:

text1, label1, label2, label3,
text1, label1, label2, ,
text1, label1, label2, , ,

At least that worked for me on 27th Jan 2019

Format of the input dataset for Google AutoML Natu

问题:

回答1:

回答2:

回答3:

收藏的人(0)

Format of the input dataset for Google AutoML Natu

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮