Error with caret package - classification v regression
up vote
0
down vote
favorite
I am an actuarial student preparing for an upcoming predictive analytics exam in December. Part of an exercise is to build a model using boosting with caret and xgbTree. See the code below, the caravan dataset is from the ISLR package:
library(caret)
library(ggplot2)
set.seed(1000)
data.Caravan <- read.csv(file = "Caravan.csv")
data.Caravan$Purchase <- factor(data.Caravan$Purchase)
levels(data.Caravan$Purchase) <- c("No", "Yes")
data.Caravan.train <- data.Caravan[1:1000, ]
data.Caravan.test <- data.Caravan[1001:nrow(data.Caravan), ]
grid <- expand.grid(max_depth = c(1:7),
nrounds = 500,
eta = c(.01, .05, .01),
colsample_bytree = c(.5, .8),
gamma = 0,
min_child_weight = 1,
subsample = .6)
control <- trainControl(method = "cv",
number = 4,
classProbs = TRUE,
sampling = c("up", "down"))
caravan.boost <- train(formula = Purchase ~ .,
data = data.Caravan.train,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
The definitions in expand.grid and trainControl were specified by the problem, but I keep getting an error:
Error: sampling methods are only implemented for classification problems
If I remove the sampling method from trainControl, I get a new error that states "Metric Accuracy not applicable for regression models". If I remove the Accuracy metric, I get an error stating
cannnot compute class probabilities for regression" and "Error in names(res$trainingData) %in% as.character(form[[2]]) : argument "form" is missing, with no default"
Ultimately the problem is that caret is defining the problem as regression, not classification, even though the target variable is set as a factor variable and classProbs is set to TRUE. Can someone explain how to tell caret to run classification and not regression?
r r-caret
add a comment |
up vote
0
down vote
favorite
I am an actuarial student preparing for an upcoming predictive analytics exam in December. Part of an exercise is to build a model using boosting with caret and xgbTree. See the code below, the caravan dataset is from the ISLR package:
library(caret)
library(ggplot2)
set.seed(1000)
data.Caravan <- read.csv(file = "Caravan.csv")
data.Caravan$Purchase <- factor(data.Caravan$Purchase)
levels(data.Caravan$Purchase) <- c("No", "Yes")
data.Caravan.train <- data.Caravan[1:1000, ]
data.Caravan.test <- data.Caravan[1001:nrow(data.Caravan), ]
grid <- expand.grid(max_depth = c(1:7),
nrounds = 500,
eta = c(.01, .05, .01),
colsample_bytree = c(.5, .8),
gamma = 0,
min_child_weight = 1,
subsample = .6)
control <- trainControl(method = "cv",
number = 4,
classProbs = TRUE,
sampling = c("up", "down"))
caravan.boost <- train(formula = Purchase ~ .,
data = data.Caravan.train,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
The definitions in expand.grid and trainControl were specified by the problem, but I keep getting an error:
Error: sampling methods are only implemented for classification problems
If I remove the sampling method from trainControl, I get a new error that states "Metric Accuracy not applicable for regression models". If I remove the Accuracy metric, I get an error stating
cannnot compute class probabilities for regression" and "Error in names(res$trainingData) %in% as.character(form[[2]]) : argument "form" is missing, with no default"
Ultimately the problem is that caret is defining the problem as regression, not classification, even though the target variable is set as a factor variable and classProbs is set to TRUE. Can someone explain how to tell caret to run classification and not regression?
r r-caret
1
Could you add the outcome fromdput(head(data.Caravan, 20))to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples
– phiver
Nov 17 at 10:37
@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40
@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am an actuarial student preparing for an upcoming predictive analytics exam in December. Part of an exercise is to build a model using boosting with caret and xgbTree. See the code below, the caravan dataset is from the ISLR package:
library(caret)
library(ggplot2)
set.seed(1000)
data.Caravan <- read.csv(file = "Caravan.csv")
data.Caravan$Purchase <- factor(data.Caravan$Purchase)
levels(data.Caravan$Purchase) <- c("No", "Yes")
data.Caravan.train <- data.Caravan[1:1000, ]
data.Caravan.test <- data.Caravan[1001:nrow(data.Caravan), ]
grid <- expand.grid(max_depth = c(1:7),
nrounds = 500,
eta = c(.01, .05, .01),
colsample_bytree = c(.5, .8),
gamma = 0,
min_child_weight = 1,
subsample = .6)
control <- trainControl(method = "cv",
number = 4,
classProbs = TRUE,
sampling = c("up", "down"))
caravan.boost <- train(formula = Purchase ~ .,
data = data.Caravan.train,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
The definitions in expand.grid and trainControl were specified by the problem, but I keep getting an error:
Error: sampling methods are only implemented for classification problems
If I remove the sampling method from trainControl, I get a new error that states "Metric Accuracy not applicable for regression models". If I remove the Accuracy metric, I get an error stating
cannnot compute class probabilities for regression" and "Error in names(res$trainingData) %in% as.character(form[[2]]) : argument "form" is missing, with no default"
Ultimately the problem is that caret is defining the problem as regression, not classification, even though the target variable is set as a factor variable and classProbs is set to TRUE. Can someone explain how to tell caret to run classification and not regression?
r r-caret
I am an actuarial student preparing for an upcoming predictive analytics exam in December. Part of an exercise is to build a model using boosting with caret and xgbTree. See the code below, the caravan dataset is from the ISLR package:
library(caret)
library(ggplot2)
set.seed(1000)
data.Caravan <- read.csv(file = "Caravan.csv")
data.Caravan$Purchase <- factor(data.Caravan$Purchase)
levels(data.Caravan$Purchase) <- c("No", "Yes")
data.Caravan.train <- data.Caravan[1:1000, ]
data.Caravan.test <- data.Caravan[1001:nrow(data.Caravan), ]
grid <- expand.grid(max_depth = c(1:7),
nrounds = 500,
eta = c(.01, .05, .01),
colsample_bytree = c(.5, .8),
gamma = 0,
min_child_weight = 1,
subsample = .6)
control <- trainControl(method = "cv",
number = 4,
classProbs = TRUE,
sampling = c("up", "down"))
caravan.boost <- train(formula = Purchase ~ .,
data = data.Caravan.train,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
The definitions in expand.grid and trainControl were specified by the problem, but I keep getting an error:
Error: sampling methods are only implemented for classification problems
If I remove the sampling method from trainControl, I get a new error that states "Metric Accuracy not applicable for regression models". If I remove the Accuracy metric, I get an error stating
cannnot compute class probabilities for regression" and "Error in names(res$trainingData) %in% as.character(form[[2]]) : argument "form" is missing, with no default"
Ultimately the problem is that caret is defining the problem as regression, not classification, even though the target variable is set as a factor variable and classProbs is set to TRUE. Can someone explain how to tell caret to run classification and not regression?
r r-caret
r r-caret
edited Nov 18 at 18:11
user6910411
31.9k76692
31.9k76692
asked Nov 16 at 18:26
Sam
1
1
1
Could you add the outcome fromdput(head(data.Caravan, 20))to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples
– phiver
Nov 17 at 10:37
@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40
@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03
add a comment |
1
Could you add the outcome fromdput(head(data.Caravan, 20))to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples
– phiver
Nov 17 at 10:37
@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40
@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03
1
1
Could you add the outcome from
dput(head(data.Caravan, 20)) to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples– phiver
Nov 17 at 10:37
Could you add the outcome from
dput(head(data.Caravan, 20)) to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples– phiver
Nov 17 at 10:37
@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40
@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40
@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03
@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
caret::train does not have a formula argument, but rather a form argument in which you specify the formula. So for instance this works:
caravan.boost <- train(form = Purchase ~ .,
data = data.Caravan.train,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
#output:
eXtreme Gradient Boosting
1000 samples
85 predictor
2 classes: 'No', 'Yes'
No pre-processing
Resampling: Cross-Validated (4 fold)
Summary of sample sizes: 751, 749, 750, 750
Addtional sampling using up-sampling
Resampling results across tuning parameters:
eta max_depth colsample_bytree Accuracy Kappa
0.01 1 0.5 0.7020495 0.10170007
0.01 1 0.8 0.7100335 0.09732773
0.01 2 0.5 0.7730581 0.12361444
0.01 2 0.8 0.7690620 0.11293561
0.01 3 0.5 0.8330506 0.14461709
0.01 3 0.8 0.8290146 0.06908344
0.01 4 0.5 0.8659949 0.07396586
0.01 4 0.8 0.8749790 0.07451637
0.01 5 0.5 0.8949792 0.07599005
0.01 5 0.8 0.8949792 0.07525191
0.01 6 0.5 0.9079873 0.09766492
0.01 6 0.8 0.9099793 0.10420720
0.01 7 0.5 0.9169833 0.11769151
0.01 7 0.8 0.9119753 0.10873268
0.05 1 0.5 0.7640699 0.08281792
0.05 1 0.8 0.7700580 0.09201503
0.05 2 0.5 0.8709909 0.09034807
0.05 2 0.8 0.8739990 0.10440898
0.05 3 0.5 0.9039792 0.12166348
0.05 3 0.8 0.9089832 0.11850402
0.05 4 0.5 0.9149793 0.11602447
0.05 4 0.8 0.9119713 0.11207786
0.05 5 0.5 0.9139633 0.11853793
0.05 5 0.8 0.9159754 0.11968085
0.05 6 0.5 0.9219794 0.11744643
0.05 6 0.8 0.9199794 0.12803204
0.05 7 0.5 0.9179873 0.08701058
0.05 7 0.8 0.9179793 0.10702619
Tuning parameter 'nrounds' was held constant at a value of 500
Tuning parameter 'gamma' was held constant
at a value of 0
Tuning parameter 'min_child_weight' was held constant at a value of 1
Tuning
parameter 'subsample' was held constant at a value of 0.6
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were nrounds = 500, max_depth = 6, eta = 0.05, gamma =
0, colsample_bytree = 0.5, min_child_weight = 1 and subsample = 0.6.
You can also use the non formula interface in which you specify the x and y separately:
caravan.boost <- train(x = data.Caravan.train[,-ncol(data.Caravan.train)],
y = data.Caravan.train$Purchase,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
do note that these two ways of specification do not always produce the same result when there are factor variables in x since the formula interface calls model.matrix for most algorithms.
To get the data:
library(ISLR)
data(Caravan)
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
caret::train does not have a formula argument, but rather a form argument in which you specify the formula. So for instance this works:
caravan.boost <- train(form = Purchase ~ .,
data = data.Caravan.train,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
#output:
eXtreme Gradient Boosting
1000 samples
85 predictor
2 classes: 'No', 'Yes'
No pre-processing
Resampling: Cross-Validated (4 fold)
Summary of sample sizes: 751, 749, 750, 750
Addtional sampling using up-sampling
Resampling results across tuning parameters:
eta max_depth colsample_bytree Accuracy Kappa
0.01 1 0.5 0.7020495 0.10170007
0.01 1 0.8 0.7100335 0.09732773
0.01 2 0.5 0.7730581 0.12361444
0.01 2 0.8 0.7690620 0.11293561
0.01 3 0.5 0.8330506 0.14461709
0.01 3 0.8 0.8290146 0.06908344
0.01 4 0.5 0.8659949 0.07396586
0.01 4 0.8 0.8749790 0.07451637
0.01 5 0.5 0.8949792 0.07599005
0.01 5 0.8 0.8949792 0.07525191
0.01 6 0.5 0.9079873 0.09766492
0.01 6 0.8 0.9099793 0.10420720
0.01 7 0.5 0.9169833 0.11769151
0.01 7 0.8 0.9119753 0.10873268
0.05 1 0.5 0.7640699 0.08281792
0.05 1 0.8 0.7700580 0.09201503
0.05 2 0.5 0.8709909 0.09034807
0.05 2 0.8 0.8739990 0.10440898
0.05 3 0.5 0.9039792 0.12166348
0.05 3 0.8 0.9089832 0.11850402
0.05 4 0.5 0.9149793 0.11602447
0.05 4 0.8 0.9119713 0.11207786
0.05 5 0.5 0.9139633 0.11853793
0.05 5 0.8 0.9159754 0.11968085
0.05 6 0.5 0.9219794 0.11744643
0.05 6 0.8 0.9199794 0.12803204
0.05 7 0.5 0.9179873 0.08701058
0.05 7 0.8 0.9179793 0.10702619
Tuning parameter 'nrounds' was held constant at a value of 500
Tuning parameter 'gamma' was held constant
at a value of 0
Tuning parameter 'min_child_weight' was held constant at a value of 1
Tuning
parameter 'subsample' was held constant at a value of 0.6
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were nrounds = 500, max_depth = 6, eta = 0.05, gamma =
0, colsample_bytree = 0.5, min_child_weight = 1 and subsample = 0.6.
You can also use the non formula interface in which you specify the x and y separately:
caravan.boost <- train(x = data.Caravan.train[,-ncol(data.Caravan.train)],
y = data.Caravan.train$Purchase,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
do note that these two ways of specification do not always produce the same result when there are factor variables in x since the formula interface calls model.matrix for most algorithms.
To get the data:
library(ISLR)
data(Caravan)
add a comment |
up vote
0
down vote
caret::train does not have a formula argument, but rather a form argument in which you specify the formula. So for instance this works:
caravan.boost <- train(form = Purchase ~ .,
data = data.Caravan.train,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
#output:
eXtreme Gradient Boosting
1000 samples
85 predictor
2 classes: 'No', 'Yes'
No pre-processing
Resampling: Cross-Validated (4 fold)
Summary of sample sizes: 751, 749, 750, 750
Addtional sampling using up-sampling
Resampling results across tuning parameters:
eta max_depth colsample_bytree Accuracy Kappa
0.01 1 0.5 0.7020495 0.10170007
0.01 1 0.8 0.7100335 0.09732773
0.01 2 0.5 0.7730581 0.12361444
0.01 2 0.8 0.7690620 0.11293561
0.01 3 0.5 0.8330506 0.14461709
0.01 3 0.8 0.8290146 0.06908344
0.01 4 0.5 0.8659949 0.07396586
0.01 4 0.8 0.8749790 0.07451637
0.01 5 0.5 0.8949792 0.07599005
0.01 5 0.8 0.8949792 0.07525191
0.01 6 0.5 0.9079873 0.09766492
0.01 6 0.8 0.9099793 0.10420720
0.01 7 0.5 0.9169833 0.11769151
0.01 7 0.8 0.9119753 0.10873268
0.05 1 0.5 0.7640699 0.08281792
0.05 1 0.8 0.7700580 0.09201503
0.05 2 0.5 0.8709909 0.09034807
0.05 2 0.8 0.8739990 0.10440898
0.05 3 0.5 0.9039792 0.12166348
0.05 3 0.8 0.9089832 0.11850402
0.05 4 0.5 0.9149793 0.11602447
0.05 4 0.8 0.9119713 0.11207786
0.05 5 0.5 0.9139633 0.11853793
0.05 5 0.8 0.9159754 0.11968085
0.05 6 0.5 0.9219794 0.11744643
0.05 6 0.8 0.9199794 0.12803204
0.05 7 0.5 0.9179873 0.08701058
0.05 7 0.8 0.9179793 0.10702619
Tuning parameter 'nrounds' was held constant at a value of 500
Tuning parameter 'gamma' was held constant
at a value of 0
Tuning parameter 'min_child_weight' was held constant at a value of 1
Tuning
parameter 'subsample' was held constant at a value of 0.6
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were nrounds = 500, max_depth = 6, eta = 0.05, gamma =
0, colsample_bytree = 0.5, min_child_weight = 1 and subsample = 0.6.
You can also use the non formula interface in which you specify the x and y separately:
caravan.boost <- train(x = data.Caravan.train[,-ncol(data.Caravan.train)],
y = data.Caravan.train$Purchase,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
do note that these two ways of specification do not always produce the same result when there are factor variables in x since the formula interface calls model.matrix for most algorithms.
To get the data:
library(ISLR)
data(Caravan)
add a comment |
up vote
0
down vote
up vote
0
down vote
caret::train does not have a formula argument, but rather a form argument in which you specify the formula. So for instance this works:
caravan.boost <- train(form = Purchase ~ .,
data = data.Caravan.train,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
#output:
eXtreme Gradient Boosting
1000 samples
85 predictor
2 classes: 'No', 'Yes'
No pre-processing
Resampling: Cross-Validated (4 fold)
Summary of sample sizes: 751, 749, 750, 750
Addtional sampling using up-sampling
Resampling results across tuning parameters:
eta max_depth colsample_bytree Accuracy Kappa
0.01 1 0.5 0.7020495 0.10170007
0.01 1 0.8 0.7100335 0.09732773
0.01 2 0.5 0.7730581 0.12361444
0.01 2 0.8 0.7690620 0.11293561
0.01 3 0.5 0.8330506 0.14461709
0.01 3 0.8 0.8290146 0.06908344
0.01 4 0.5 0.8659949 0.07396586
0.01 4 0.8 0.8749790 0.07451637
0.01 5 0.5 0.8949792 0.07599005
0.01 5 0.8 0.8949792 0.07525191
0.01 6 0.5 0.9079873 0.09766492
0.01 6 0.8 0.9099793 0.10420720
0.01 7 0.5 0.9169833 0.11769151
0.01 7 0.8 0.9119753 0.10873268
0.05 1 0.5 0.7640699 0.08281792
0.05 1 0.8 0.7700580 0.09201503
0.05 2 0.5 0.8709909 0.09034807
0.05 2 0.8 0.8739990 0.10440898
0.05 3 0.5 0.9039792 0.12166348
0.05 3 0.8 0.9089832 0.11850402
0.05 4 0.5 0.9149793 0.11602447
0.05 4 0.8 0.9119713 0.11207786
0.05 5 0.5 0.9139633 0.11853793
0.05 5 0.8 0.9159754 0.11968085
0.05 6 0.5 0.9219794 0.11744643
0.05 6 0.8 0.9199794 0.12803204
0.05 7 0.5 0.9179873 0.08701058
0.05 7 0.8 0.9179793 0.10702619
Tuning parameter 'nrounds' was held constant at a value of 500
Tuning parameter 'gamma' was held constant
at a value of 0
Tuning parameter 'min_child_weight' was held constant at a value of 1
Tuning
parameter 'subsample' was held constant at a value of 0.6
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were nrounds = 500, max_depth = 6, eta = 0.05, gamma =
0, colsample_bytree = 0.5, min_child_weight = 1 and subsample = 0.6.
You can also use the non formula interface in which you specify the x and y separately:
caravan.boost <- train(x = data.Caravan.train[,-ncol(data.Caravan.train)],
y = data.Caravan.train$Purchase,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
do note that these two ways of specification do not always produce the same result when there are factor variables in x since the formula interface calls model.matrix for most algorithms.
To get the data:
library(ISLR)
data(Caravan)
caret::train does not have a formula argument, but rather a form argument in which you specify the formula. So for instance this works:
caravan.boost <- train(form = Purchase ~ .,
data = data.Caravan.train,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
#output:
eXtreme Gradient Boosting
1000 samples
85 predictor
2 classes: 'No', 'Yes'
No pre-processing
Resampling: Cross-Validated (4 fold)
Summary of sample sizes: 751, 749, 750, 750
Addtional sampling using up-sampling
Resampling results across tuning parameters:
eta max_depth colsample_bytree Accuracy Kappa
0.01 1 0.5 0.7020495 0.10170007
0.01 1 0.8 0.7100335 0.09732773
0.01 2 0.5 0.7730581 0.12361444
0.01 2 0.8 0.7690620 0.11293561
0.01 3 0.5 0.8330506 0.14461709
0.01 3 0.8 0.8290146 0.06908344
0.01 4 0.5 0.8659949 0.07396586
0.01 4 0.8 0.8749790 0.07451637
0.01 5 0.5 0.8949792 0.07599005
0.01 5 0.8 0.8949792 0.07525191
0.01 6 0.5 0.9079873 0.09766492
0.01 6 0.8 0.9099793 0.10420720
0.01 7 0.5 0.9169833 0.11769151
0.01 7 0.8 0.9119753 0.10873268
0.05 1 0.5 0.7640699 0.08281792
0.05 1 0.8 0.7700580 0.09201503
0.05 2 0.5 0.8709909 0.09034807
0.05 2 0.8 0.8739990 0.10440898
0.05 3 0.5 0.9039792 0.12166348
0.05 3 0.8 0.9089832 0.11850402
0.05 4 0.5 0.9149793 0.11602447
0.05 4 0.8 0.9119713 0.11207786
0.05 5 0.5 0.9139633 0.11853793
0.05 5 0.8 0.9159754 0.11968085
0.05 6 0.5 0.9219794 0.11744643
0.05 6 0.8 0.9199794 0.12803204
0.05 7 0.5 0.9179873 0.08701058
0.05 7 0.8 0.9179793 0.10702619
Tuning parameter 'nrounds' was held constant at a value of 500
Tuning parameter 'gamma' was held constant
at a value of 0
Tuning parameter 'min_child_weight' was held constant at a value of 1
Tuning
parameter 'subsample' was held constant at a value of 0.6
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were nrounds = 500, max_depth = 6, eta = 0.05, gamma =
0, colsample_bytree = 0.5, min_child_weight = 1 and subsample = 0.6.
You can also use the non formula interface in which you specify the x and y separately:
caravan.boost <- train(x = data.Caravan.train[,-ncol(data.Caravan.train)],
y = data.Caravan.train$Purchase,
method = "xgbTree",
metric = "Accuracy",
trControl = control,
tuneGrid = grid)
do note that these two ways of specification do not always produce the same result when there are factor variables in x since the formula interface calls model.matrix for most algorithms.
To get the data:
library(ISLR)
data(Caravan)
answered 2 days ago
missuse
11.3k2621
11.3k2621
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53343462%2ferror-with-caret-package-classification-v-regression%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Could you add the outcome from
dput(head(data.Caravan, 20))to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples– phiver
Nov 17 at 10:37
@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40
@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03