Error with caret package - classification v regression

up vote
0
down vote

favorite

I am an actuarial student preparing for an upcoming predictive analytics exam in December. Part of an exercise is to build a model using boosting with caret and xgbTree. See the code below, the caravan dataset is from the ISLR package:

library(caret)

library(ggplot2)

set.seed(1000)

data.Caravan <- read.csv(file = "Caravan.csv")





data.Caravan$Purchase <- factor(data.Caravan$Purchase)

levels(data.Caravan$Purchase) <- c("No", "Yes")





data.Caravan.train <- data.Caravan[1:1000, ]

data.Caravan.test <- data.Caravan[1001:nrow(data.Caravan), ]

grid <- expand.grid(max_depth = c(1:7),

                    nrounds = 500,

                    eta =  c(.01, .05, .01),

                    colsample_bytree = c(.5, .8),

                    gamma = 0,

                    min_child_weight = 1,

                    subsample = .6)



control <- trainControl(method = "cv", 

                        number = 4,

                        classProbs = TRUE,

                        sampling = c("up", "down"))



caravan.boost <- train(formula = Purchase ~ .,

                       data =  data.Caravan.train, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)

The definitions in expand.grid and trainControl were specified by the problem, but I keep getting an error:

Error: sampling methods are only implemented for classification problems

If I remove the sampling method from trainControl, I get a new error that states "Metric Accuracy not applicable for regression models". If I remove the Accuracy metric, I get an error stating

cannnot compute class probabilities for regression" and "Error in names(res$trainingData) %in% as.character(form[[2]]) : argument "form" is missing, with no default"

Ultimately the problem is that caret is defining the problem as regression, not classification, even though the target variable is set as a factor variable and classProbs is set to TRUE. Can someone explain how to tell caret to run classification and not regression?

edited Nov 18 at 18:11

user6910411

31.9k76692

asked Nov 16 at 18:26

Sam

1

Could you add the outcome from dput(head(data.Caravan, 20)) to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples
– phiver
Nov 17 at 10:37

@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40

@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03

add a comment |

up vote
0
down vote

favorite

library(caret)

library(ggplot2)

set.seed(1000)

data.Caravan <- read.csv(file = "Caravan.csv")





data.Caravan$Purchase <- factor(data.Caravan$Purchase)

levels(data.Caravan$Purchase) <- c("No", "Yes")





data.Caravan.train <- data.Caravan[1:1000, ]

data.Caravan.test <- data.Caravan[1001:nrow(data.Caravan), ]

grid <- expand.grid(max_depth = c(1:7),

                    nrounds = 500,

                    eta =  c(.01, .05, .01),

                    colsample_bytree = c(.5, .8),

                    gamma = 0,

                    min_child_weight = 1,

                    subsample = .6)



control <- trainControl(method = "cv", 

                        number = 4,

                        classProbs = TRUE,

                        sampling = c("up", "down"))



caravan.boost <- train(formula = Purchase ~ .,

                       data =  data.Caravan.train, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)

The definitions in expand.grid and trainControl were specified by the problem, but I keep getting an error:

Error: sampling methods are only implemented for classification problems

If I remove the sampling method from trainControl, I get a new error that states "Metric Accuracy not applicable for regression models". If I remove the Accuracy metric, I get an error stating

cannnot compute class probabilities for regression" and "Error in names(res$trainingData) %in% as.character(form[[2]]) : argument "form" is missing, with no default"

edited Nov 18 at 18:11

user6910411

31.9k76692

asked Nov 16 at 18:26

Sam

1

Could you add the outcome from dput(head(data.Caravan, 20)) to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples
– phiver
Nov 17 at 10:37

@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40

@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03

add a comment |

up vote
0
down vote

favorite

library(caret)

library(ggplot2)

set.seed(1000)

data.Caravan <- read.csv(file = "Caravan.csv")





data.Caravan$Purchase <- factor(data.Caravan$Purchase)

levels(data.Caravan$Purchase) <- c("No", "Yes")





data.Caravan.train <- data.Caravan[1:1000, ]

data.Caravan.test <- data.Caravan[1001:nrow(data.Caravan), ]

grid <- expand.grid(max_depth = c(1:7),

                    nrounds = 500,

                    eta =  c(.01, .05, .01),

                    colsample_bytree = c(.5, .8),

                    gamma = 0,

                    min_child_weight = 1,

                    subsample = .6)



control <- trainControl(method = "cv", 

                        number = 4,

                        classProbs = TRUE,

                        sampling = c("up", "down"))



caravan.boost <- train(formula = Purchase ~ .,

                       data =  data.Caravan.train, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)

The definitions in expand.grid and trainControl were specified by the problem, but I keep getting an error:

Error: sampling methods are only implemented for classification problems

If I remove the sampling method from trainControl, I get a new error that states "Metric Accuracy not applicable for regression models". If I remove the Accuracy metric, I get an error stating

cannnot compute class probabilities for regression" and "Error in names(res$trainingData) %in% as.character(form[[2]]) : argument "form" is missing, with no default"

edited Nov 18 at 18:11

user6910411

31.9k76692

asked Nov 16 at 18:26

Sam

library(caret)

library(ggplot2)

set.seed(1000)

data.Caravan <- read.csv(file = "Caravan.csv")





data.Caravan$Purchase <- factor(data.Caravan$Purchase)

levels(data.Caravan$Purchase) <- c("No", "Yes")





data.Caravan.train <- data.Caravan[1:1000, ]

data.Caravan.test <- data.Caravan[1001:nrow(data.Caravan), ]

grid <- expand.grid(max_depth = c(1:7),

                    nrounds = 500,

                    eta =  c(.01, .05, .01),

                    colsample_bytree = c(.5, .8),

                    gamma = 0,

                    min_child_weight = 1,

                    subsample = .6)



control <- trainControl(method = "cv", 

                        number = 4,

                        classProbs = TRUE,

                        sampling = c("up", "down"))



caravan.boost <- train(formula = Purchase ~ .,

                       data =  data.Caravan.train, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)

The definitions in expand.grid and trainControl were specified by the problem, but I keep getting an error:

Error: sampling methods are only implemented for classification problems

If I remove the sampling method from trainControl, I get a new error that states "Metric Accuracy not applicable for regression models". If I remove the Accuracy metric, I get an error stating

cannnot compute class probabilities for regression" and "Error in names(res$trainingData) %in% as.character(form[[2]]) : argument "form" is missing, with no default"

r r-caret

edited Nov 18 at 18:11

user6910411

31.9k76692

asked Nov 16 at 18:26

Sam

edited Nov 18 at 18:11

user6910411

31.9k76692

asked Nov 16 at 18:26

Sam

edited Nov 18 at 18:11

user6910411

31.9k76692

edited Nov 18 at 18:11

user6910411

31.9k76692

edited Nov 18 at 18:11

user6910411

31.9k76692

asked Nov 16 at 18:26

Sam

asked Nov 16 at 18:26

Sam

asked Nov 16 at 18:26

Sam

1

Could you add the outcome from dput(head(data.Caravan, 20)) to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples
– phiver
Nov 17 at 10:37

@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40

@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03

add a comment |

1

Could you add the outcome from dput(head(data.Caravan, 20)) to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples
– phiver
Nov 17 at 10:37

@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40

@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03

Could you add the outcome from dput(head(data.Caravan, 20)) to your question? This will give us the first 20 records of your source data. That way we can run your code with your data. For more info read this post on reproducible examples
– phiver
Nov 17 at 10:37

@phiver thanks for the reply. The dataset has 86 fields. That may be too much to paste in here. The caravan dataset is available in the ISLR package.
– Sam
Nov 20 at 21:40

@missuse thanks for the comment. I'm not sure what you meant by not using the formula interface, so I tried a couple of iterations of different things and found something that worked. The only thing I changed was "formula = Purchase~." to "Purchase~." and it worked. I have no idea why though. How did you know to do that?
– Sam
Nov 20 at 22:03

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

caret::train does not have a formula argument, but rather a form argument in which you specify the formula. So for instance this works:

caravan.boost <- train(form = Purchase ~ .,

                       data =  data.Caravan.train, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)



#output:

eXtreme Gradient Boosting 



1000 samples

  85 predictor

   2 classes: 'No', 'Yes' 



No pre-processing

Resampling: Cross-Validated (4 fold) 

Summary of sample sizes: 751, 749, 750, 750 

Addtional sampling using up-sampling



Resampling results across tuning parameters:



  eta   max_depth  colsample_bytree  Accuracy   Kappa     

  0.01  1          0.5               0.7020495  0.10170007

  0.01  1          0.8               0.7100335  0.09732773

  0.01  2          0.5               0.7730581  0.12361444

  0.01  2          0.8               0.7690620  0.11293561

  0.01  3          0.5               0.8330506  0.14461709

  0.01  3          0.8               0.8290146  0.06908344

  0.01  4          0.5               0.8659949  0.07396586

  0.01  4          0.8               0.8749790  0.07451637

  0.01  5          0.5               0.8949792  0.07599005

  0.01  5          0.8               0.8949792  0.07525191

  0.01  6          0.5               0.9079873  0.09766492

  0.01  6          0.8               0.9099793  0.10420720

  0.01  7          0.5               0.9169833  0.11769151

  0.01  7          0.8               0.9119753  0.10873268

  0.05  1          0.5               0.7640699  0.08281792

  0.05  1          0.8               0.7700580  0.09201503

  0.05  2          0.5               0.8709909  0.09034807

  0.05  2          0.8               0.8739990  0.10440898

  0.05  3          0.5               0.9039792  0.12166348

  0.05  3          0.8               0.9089832  0.11850402

  0.05  4          0.5               0.9149793  0.11602447

  0.05  4          0.8               0.9119713  0.11207786

  0.05  5          0.5               0.9139633  0.11853793

  0.05  5          0.8               0.9159754  0.11968085

  0.05  6          0.5               0.9219794  0.11744643

  0.05  6          0.8               0.9199794  0.12803204

  0.05  7          0.5               0.9179873  0.08701058

  0.05  7          0.8               0.9179793  0.10702619



Tuning parameter 'nrounds' was held constant at a value of 500

Tuning parameter 'gamma' was held constant

 at a value of 0

Tuning parameter 'min_child_weight' was held constant at a value of 1

Tuning

 parameter 'subsample' was held constant at a value of 0.6

Accuracy was used to select the optimal model using the largest value.

The final values used for the model were nrounds = 500, max_depth = 6, eta = 0.05, gamma =

 0, colsample_bytree = 0.5, min_child_weight = 1 and subsample = 0.6.

You can also use the non formula interface in which you specify the x and y separately:

caravan.boost <- train(x = data.Caravan.train[,-ncol(data.Caravan.train)],

                       y =  data.Caravan.train$Purchase, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)

do note that these two ways of specification do not always produce the same result when there are factor variables in x since the formula interface calls model.matrix for most algorithms.

To get the data:

library(ISLR)

data(Caravan)

answered 2 days ago

missuse

11.3k2621

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53343462%2ferror-with-caret-package-classification-v-regression%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

caret::train does not have a formula argument, but rather a form argument in which you specify the formula. So for instance this works:

caravan.boost <- train(form = Purchase ~ .,

                       data =  data.Caravan.train, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)



#output:

eXtreme Gradient Boosting 



1000 samples

  85 predictor

   2 classes: 'No', 'Yes' 



No pre-processing

Resampling: Cross-Validated (4 fold) 

Summary of sample sizes: 751, 749, 750, 750 

Addtional sampling using up-sampling



Resampling results across tuning parameters:



  eta   max_depth  colsample_bytree  Accuracy   Kappa     

  0.01  1          0.5               0.7020495  0.10170007

  0.01  1          0.8               0.7100335  0.09732773

  0.01  2          0.5               0.7730581  0.12361444

  0.01  2          0.8               0.7690620  0.11293561

  0.01  3          0.5               0.8330506  0.14461709

  0.01  3          0.8               0.8290146  0.06908344

  0.01  4          0.5               0.8659949  0.07396586

  0.01  4          0.8               0.8749790  0.07451637

  0.01  5          0.5               0.8949792  0.07599005

  0.01  5          0.8               0.8949792  0.07525191

  0.01  6          0.5               0.9079873  0.09766492

  0.01  6          0.8               0.9099793  0.10420720

  0.01  7          0.5               0.9169833  0.11769151

  0.01  7          0.8               0.9119753  0.10873268

  0.05  1          0.5               0.7640699  0.08281792

  0.05  1          0.8               0.7700580  0.09201503

  0.05  2          0.5               0.8709909  0.09034807

  0.05  2          0.8               0.8739990  0.10440898

  0.05  3          0.5               0.9039792  0.12166348

  0.05  3          0.8               0.9089832  0.11850402

  0.05  4          0.5               0.9149793  0.11602447

  0.05  4          0.8               0.9119713  0.11207786

  0.05  5          0.5               0.9139633  0.11853793

  0.05  5          0.8               0.9159754  0.11968085

  0.05  6          0.5               0.9219794  0.11744643

  0.05  6          0.8               0.9199794  0.12803204

  0.05  7          0.5               0.9179873  0.08701058

  0.05  7          0.8               0.9179793  0.10702619



Tuning parameter 'nrounds' was held constant at a value of 500

Tuning parameter 'gamma' was held constant

 at a value of 0

Tuning parameter 'min_child_weight' was held constant at a value of 1

Tuning

 parameter 'subsample' was held constant at a value of 0.6

Accuracy was used to select the optimal model using the largest value.

The final values used for the model were nrounds = 500, max_depth = 6, eta = 0.05, gamma =

 0, colsample_bytree = 0.5, min_child_weight = 1 and subsample = 0.6.

You can also use the non formula interface in which you specify the x and y separately:

caravan.boost <- train(x = data.Caravan.train[,-ncol(data.Caravan.train)],

                       y =  data.Caravan.train$Purchase, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)

do note that these two ways of specification do not always produce the same result when there are factor variables in x since the formula interface calls model.matrix for most algorithms.

To get the data:

library(ISLR)

data(Caravan)

answered 2 days ago

missuse

11.3k2621

add a comment |

up vote
0
down vote

caret::train does not have a formula argument, but rather a form argument in which you specify the formula. So for instance this works:

caravan.boost <- train(form = Purchase ~ .,

                       data =  data.Caravan.train, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)



#output:

eXtreme Gradient Boosting 



1000 samples

  85 predictor

   2 classes: 'No', 'Yes' 



No pre-processing

Resampling: Cross-Validated (4 fold) 

Summary of sample sizes: 751, 749, 750, 750 

Addtional sampling using up-sampling



Resampling results across tuning parameters:



  eta   max_depth  colsample_bytree  Accuracy   Kappa     

  0.01  1          0.5               0.7020495  0.10170007

  0.01  1          0.8               0.7100335  0.09732773

  0.01  2          0.5               0.7730581  0.12361444

  0.01  2          0.8               0.7690620  0.11293561

  0.01  3          0.5               0.8330506  0.14461709

  0.01  3          0.8               0.8290146  0.06908344

  0.01  4          0.5               0.8659949  0.07396586

  0.01  4          0.8               0.8749790  0.07451637

  0.01  5          0.5               0.8949792  0.07599005

  0.01  5          0.8               0.8949792  0.07525191

  0.01  6          0.5               0.9079873  0.09766492

  0.01  6          0.8               0.9099793  0.10420720

  0.01  7          0.5               0.9169833  0.11769151

  0.01  7          0.8               0.9119753  0.10873268

  0.05  1          0.5               0.7640699  0.08281792

  0.05  1          0.8               0.7700580  0.09201503

  0.05  2          0.5               0.8709909  0.09034807

  0.05  2          0.8               0.8739990  0.10440898

  0.05  3          0.5               0.9039792  0.12166348

  0.05  3          0.8               0.9089832  0.11850402

  0.05  4          0.5               0.9149793  0.11602447

  0.05  4          0.8               0.9119713  0.11207786

  0.05  5          0.5               0.9139633  0.11853793

  0.05  5          0.8               0.9159754  0.11968085

  0.05  6          0.5               0.9219794  0.11744643

  0.05  6          0.8               0.9199794  0.12803204

  0.05  7          0.5               0.9179873  0.08701058

  0.05  7          0.8               0.9179793  0.10702619



Tuning parameter 'nrounds' was held constant at a value of 500

Tuning parameter 'gamma' was held constant

 at a value of 0

Tuning parameter 'min_child_weight' was held constant at a value of 1

Tuning

 parameter 'subsample' was held constant at a value of 0.6

Accuracy was used to select the optimal model using the largest value.

The final values used for the model were nrounds = 500, max_depth = 6, eta = 0.05, gamma =

 0, colsample_bytree = 0.5, min_child_weight = 1 and subsample = 0.6.

You can also use the non formula interface in which you specify the x and y separately:

caravan.boost <- train(x = data.Caravan.train[,-ncol(data.Caravan.train)],

                       y =  data.Caravan.train$Purchase, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)

do note that these two ways of specification do not always produce the same result when there are factor variables in x since the formula interface calls model.matrix for most algorithms.

To get the data:

library(ISLR)

data(Caravan)

answered 2 days ago

missuse

11.3k2621

add a comment |

up vote
0
down vote

caret::train does not have a formula argument, but rather a form argument in which you specify the formula. So for instance this works:

caravan.boost <- train(form = Purchase ~ .,

                       data =  data.Caravan.train, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)



#output:

eXtreme Gradient Boosting 



1000 samples

  85 predictor

   2 classes: 'No', 'Yes' 



No pre-processing

Resampling: Cross-Validated (4 fold) 

Summary of sample sizes: 751, 749, 750, 750 

Addtional sampling using up-sampling



Resampling results across tuning parameters:



  eta   max_depth  colsample_bytree  Accuracy   Kappa     

  0.01  1          0.5               0.7020495  0.10170007

  0.01  1          0.8               0.7100335  0.09732773

  0.01  2          0.5               0.7730581  0.12361444

  0.01  2          0.8               0.7690620  0.11293561

  0.01  3          0.5               0.8330506  0.14461709

  0.01  3          0.8               0.8290146  0.06908344

  0.01  4          0.5               0.8659949  0.07396586

  0.01  4          0.8               0.8749790  0.07451637

  0.01  5          0.5               0.8949792  0.07599005

  0.01  5          0.8               0.8949792  0.07525191

  0.01  6          0.5               0.9079873  0.09766492

  0.01  6          0.8               0.9099793  0.10420720

  0.01  7          0.5               0.9169833  0.11769151

  0.01  7          0.8               0.9119753  0.10873268

  0.05  1          0.5               0.7640699  0.08281792

  0.05  1          0.8               0.7700580  0.09201503

  0.05  2          0.5               0.8709909  0.09034807

  0.05  2          0.8               0.8739990  0.10440898

  0.05  3          0.5               0.9039792  0.12166348

  0.05  3          0.8               0.9089832  0.11850402

  0.05  4          0.5               0.9149793  0.11602447

  0.05  4          0.8               0.9119713  0.11207786

  0.05  5          0.5               0.9139633  0.11853793

  0.05  5          0.8               0.9159754  0.11968085

  0.05  6          0.5               0.9219794  0.11744643

  0.05  6          0.8               0.9199794  0.12803204

  0.05  7          0.5               0.9179873  0.08701058

  0.05  7          0.8               0.9179793  0.10702619



Tuning parameter 'nrounds' was held constant at a value of 500

Tuning parameter 'gamma' was held constant

 at a value of 0

Tuning parameter 'min_child_weight' was held constant at a value of 1

Tuning

 parameter 'subsample' was held constant at a value of 0.6

Accuracy was used to select the optimal model using the largest value.

The final values used for the model were nrounds = 500, max_depth = 6, eta = 0.05, gamma =

 0, colsample_bytree = 0.5, min_child_weight = 1 and subsample = 0.6.

You can also use the non formula interface in which you specify the x and y separately:

caravan.boost <- train(x = data.Caravan.train[,-ncol(data.Caravan.train)],

                       y =  data.Caravan.train$Purchase, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)

do note that these two ways of specification do not always produce the same result when there are factor variables in x since the formula interface calls model.matrix for most algorithms.

To get the data:

library(ISLR)

data(Caravan)

answered 2 days ago

missuse

11.3k2621

caret::train does not have a formula argument, but rather a form argument in which you specify the formula. So for instance this works:

caravan.boost <- train(form = Purchase ~ .,

                       data =  data.Caravan.train, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)



#output:

eXtreme Gradient Boosting 



1000 samples

  85 predictor

   2 classes: 'No', 'Yes' 



No pre-processing

Resampling: Cross-Validated (4 fold) 

Summary of sample sizes: 751, 749, 750, 750 

Addtional sampling using up-sampling



Resampling results across tuning parameters:



  eta   max_depth  colsample_bytree  Accuracy   Kappa     

  0.01  1          0.5               0.7020495  0.10170007

  0.01  1          0.8               0.7100335  0.09732773

  0.01  2          0.5               0.7730581  0.12361444

  0.01  2          0.8               0.7690620  0.11293561

  0.01  3          0.5               0.8330506  0.14461709

  0.01  3          0.8               0.8290146  0.06908344

  0.01  4          0.5               0.8659949  0.07396586

  0.01  4          0.8               0.8749790  0.07451637

  0.01  5          0.5               0.8949792  0.07599005

  0.01  5          0.8               0.8949792  0.07525191

  0.01  6          0.5               0.9079873  0.09766492

  0.01  6          0.8               0.9099793  0.10420720

  0.01  7          0.5               0.9169833  0.11769151

  0.01  7          0.8               0.9119753  0.10873268

  0.05  1          0.5               0.7640699  0.08281792

  0.05  1          0.8               0.7700580  0.09201503

  0.05  2          0.5               0.8709909  0.09034807

  0.05  2          0.8               0.8739990  0.10440898

  0.05  3          0.5               0.9039792  0.12166348

  0.05  3          0.8               0.9089832  0.11850402

  0.05  4          0.5               0.9149793  0.11602447

  0.05  4          0.8               0.9119713  0.11207786

  0.05  5          0.5               0.9139633  0.11853793

  0.05  5          0.8               0.9159754  0.11968085

  0.05  6          0.5               0.9219794  0.11744643

  0.05  6          0.8               0.9199794  0.12803204

  0.05  7          0.5               0.9179873  0.08701058

  0.05  7          0.8               0.9179793  0.10702619



Tuning parameter 'nrounds' was held constant at a value of 500

Tuning parameter 'gamma' was held constant

 at a value of 0

Tuning parameter 'min_child_weight' was held constant at a value of 1

Tuning

 parameter 'subsample' was held constant at a value of 0.6

Accuracy was used to select the optimal model using the largest value.

The final values used for the model were nrounds = 500, max_depth = 6, eta = 0.05, gamma =

 0, colsample_bytree = 0.5, min_child_weight = 1 and subsample = 0.6.

You can also use the non formula interface in which you specify the x and y separately:

caravan.boost <- train(x = data.Caravan.train[,-ncol(data.Caravan.train)],

                       y =  data.Caravan.train$Purchase, 

                       method = "xgbTree", 

                       metric = "Accuracy",

                       trControl = control, 

                       tuneGrid = grid)

do note that these two ways of specification do not always produce the same result when there are factor variables in x since the formula interface calls model.matrix for most algorithms.

To get the data:

library(ISLR)

data(Caravan)

answered 2 days ago

missuse

11.3k2621

answered 2 days ago

missuse

11.3k2621

answered 2 days ago

missuse

11.3k2621

answered 2 days ago

missuse

11.3k2621

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl