Choosing classification algorithm to classify mix

I have a dataset of about 100,000 records about buying pattern of customers. The data set contains

Age (continuous value from 2 to 120) but I have plan also to categorize into age ranges.
Gender (either 0 or 1)
Address (can be only six types or I can also represent using numbers from 1 to 6)
Preference shop (can be from only 7 shops) which is my class problem.

So my problem is to classify and predict the customers based on their Age,gender and location for Preference shop. I have tried to use naive and decision trees but their classification accuracy is little bit low below.

I am thinking also logistic regression but I am not sure about the discrete value like gender and address. But, I have also assumed SVM with some kernal tricks but not yet tried.

So which machine learning algorithm do you suggest for better accuracy with these features.

标签： machine-learning data-mining classification

2条回答

我命由我不由天

2楼-- · 2019-03-16 18:47

The issue is that you're representing nominal variables on a continuous scale, which imposes a (spurious) ordinal relationship between classes when you use machine learning methods. For example, if you code address as one of six possible integers, then address 1 is closer to address 2 than it is to address 3,4,5,6. This is going to cause problems when you try to learn anything.

Instead, translate your 6-value categorical variable to six binary variables, one for each categorical value. Your original feature will then give rise to six features, where only one will ever be on. Also, keep the age as an integer value since you lose information by making it categorical.

As for approaches, it's unlikely to make much of a difference (at least initially). Go with whichever is easier for you to implement. However, make sure you run some sort of cross-validation parameter selection on a dev set before running on your test set, as all algorithms have parameters than can dramatically affect learning accuracy.

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2019-03-16 19:04

You really need to look at the data and determine if there is enough variance between your labels and the features that you currently have. Because there are so few features but a lot of data, something such as kNN could work well.

You could adapt collaborative filtering to solve your problem as that would also work off of similar features.

0人赞添加讨论(0) 举报

Choosing classification algorithm to classify mix

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间