Skip to contents

This function creates a modified random forest model for principal component and meta-data. This can be useful to get a final model, but we recommend use of randomForest_CVPC in general, which includes the final model.

Usage

funkyForest(
  data,
  outcome = colnames(data)[1],
  unit = colnames(data)[2],
  nTrees = 500,
  varImpPlot = TRUE,
  metaNames = NULL,
  keepModels = TRUE,
  varSelPercent = 0.8,
  method = "class"
)

Arguments

data

Data.frame of outcome and predictors. The predictors include groups of variables which are finite projections of a higher dimensional variables as well as single meta-variables.

Any replicate data, i.e. repeated observations, should already be handled. The unit column is needed just to drop data (so pre-removing and giving NULL works). Typically use the results from getKsPCAData, potentially with meta-variables attached.

outcome

(Optional) String indicating the outcome column name in data. Default is the first column of data.

unit

(Optional) String indicating the unit column name in data. Default is the second column of data.

nTrees

(Optional) Numeric indicating the number of trees to use in the random forest model. Default is 500.

varImpPlot

(Optional) Boolean indicating if variable importance plots should also be returned with the model. Default is TRUE.

metaNames

(Optional) Vector with the column names of data that correspond to metavariables. Default is NULL.

keepModels

(Optional) Boolean indicating if the individual models should be kept. Can get large in size. Default is TRUE as it is needed for predictions.

varSelPercent

(Optional) Numeric in (0,1) indicating (approx) percentage of variables to keep for each tree. Default is 0.8.

method

(Optional) Method for rpart tree to build random forest. Default is "class". Currently this is the only tested method. This will be expanded in future releases.

Value

A list with entries

  1. varImportanceData: Data.frame for variable importance information.

  2. (Optional) model: List of CART that builds the random forest model.

  3. (Optional) varImportancePlot: Variable importance plots.

Examples

ff <- funkyForest(
  data = TNBC[, c(1:8, ncol(TNBC))],
  outcome = "Class", unit = "Person",
  metaNames = c("Age")
  )