This function creates a modified random forest model for principal component and meta-data. This can be useful to get a final model, but we recommend use of randomForest_CVPC in general, which includes the final model.
Arguments
- data
Data.frame of outcome and predictors. The predictors include groups of variables which are finite projections of a higher dimensional variables as well as single meta-variables.
Any replicate data, i.e. repeated observations, should already be handled. The unit column is needed just to drop data (so pre-removing and giving NULL works). Typically use the results from getKsPCAData, potentially with meta-variables attached.
- outcome
(Optional) String indicating the outcome column name in data. Default is the first column of data.
- unit
(Optional) String indicating the unit column name in data. Default is the second column of data.
- nTrees
(Optional) Numeric indicating the number of trees to use in the random forest model. Default is 500.
- varImpPlot
(Optional) Boolean indicating if variable importance plots should also be returned with the model. Default is TRUE.
- metaNames
(Optional) Vector with the column names of data that correspond to metavariables. Default is NULL.
- keepModels
(Optional) Boolean indicating if the individual models should be kept. Can get large in size. Default is TRUE as it is needed for predictions.
- varSelPercent
(Optional) Numeric in (0,1) indicating (approx) percentage of variables to keep for each tree. Default is 0.8.
- method
(Optional) Method for rpart tree to build random forest. Default is "class". Currently this is the only tested method. This will be expanded in future releases.