可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am getting the following error
c50 code called exit with value 1
I am doing this on the titanic data available from Kaggle
# Importing datasets
train <- read.csv("train.csv", sep=",")
# this is the structure
str(train)
Output :-
'data.frame': 891 obs. of 12 variables:
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Name : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
$ Embarked : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
Then I tried using C5.0 dtree
# Trying with C5.0 decision tree
library(C50)
#C5.0 models require a factor outcome otherwise error
train$Survived <- factor(train$Survived)
new_model <- C5.0(train[-2],train$Survived)
So running the above lines gives me this error
c50 code called exit with value 1
I'm not able to figure out what's going wrong? I was using similar code on different dataset and it was working fine. Any ideas about how can I debug my code?
-Thanks
回答1:
For anyone interested, the data can be found here: http://www.kaggle.com/c/titanic-gettingStarted/data. I think you need to be registered in order to download it.
Regarding your problem, first of I think you meant to write
new_model <- C5.0(train[,-2],train$Survived)
Next, notice the structure of the Cabin
and Embarked
Columns. These two factors have an empty character as a level name (check with levels(train$Embarked)
). This is the point where C50
falls over. If you modify your data such that
levels(train$Cabin)[1] = "missing"
levels(train$Embarked)[1] = "missing"
your algorithm will now run without an error.
回答2:
Just in case. You can take a look to the error by
summary(new_model)
Also this error occurs when there are a special characters in the name of a variable. For example, one will get this error if there is "я"(it's from Russian alphabet) character in the name of a variable.
回答3:
Here is what worked finally:-
Got this idea after reading this post
library(C50)
test$Survived <- NA
combinedData <- rbind(train,test)
combinedData$Survived <- factor(combinedData$Survived)
# fixing empty character level names
levels(combinedData$Cabin)[1] = "missing"
levels(combinedData$Embarked)[1] = "missing"
new_train <- combinedData[1:891,]
new_test <- combinedData[892:1309,]
new_model <- C5.0(new_train[,-2],new_train$Survived)
new_model_predict <- predict(new_model,new_test)
submitC50 <- data.frame(PassengerId=new_test$PassengerId, Survived=new_model_predict)
write.csv(submitC50, file="c50dtree.csv", row.names=FALSE)
The intuition behind this is that in this way both the train and test data set will have consistent factor levels.
回答4:
I had the same error, but I was using a numeric dataset without missing values.
After a long time, I discovered that my dataset had a predictive attribute called "outcome"
and the C5.0Control
use this name, and this was the error cause :'(
My solution was changing the column name. Other way, would be create a C5.0Control
object and change the value of the label attribute and then pass this object as parameter for the C50 method.
回答5:
I also struggled some hours with the same Problem (Return code "1") when building a model as well as when predicting.
With the hint of answer of Marco I have written a small function to remove all factor levels equal to "" in a data frame or vector, see code below. However, since R does not allow for pass by reference to functions, you have to use the result of the function (it can not change the original dataframe):
removeBlankLevelsInDataFrame <- function(dataframe) {
for (i in 1:ncol(dataframe)) {
levels <- levels(dataframe[, i])
if (!is.null(levels) && levels[1] == "") {
levels(dataframe[,i])[1] = "?"
}
}
dataframe
}
removeBlankLevelsInVector <- function(vector) {
levels <- levels(vector)
if (!is.null(levels) && levels[1] == "") {
levels(vector)[1] = "?"
}
vector
}
Call of the functions may look like this:
trainX = removeBlankLevelsInDataFrame(trainX)
trainY = removeBlankLevelsInVector(trainY)
model = C50::C5.0.default(trainX,trainY)
However, it seems, that C50 has a similar Problem with character columns containing an empty cell, so you will have probably to extend this to handle also character attributes if you have some.
回答6:
I also got the same error, but it was because of some illegal characters in the factor levels of one the columns.
I used make.names
function and corrected the factor levels:
levels(FooData$BarColumn) <- make.names(levels(FooData$BarColumn))
Then the problem was resolved.