Skip to contents

The function fits a modified random forest model to principal components of spatial interactions as well as meta-data. Additionally permutation and cross-validation is employed to improve understanding of the data.

Usage

funkyModel(
  data,
  K = 10,
  outcome = colnames(data)[1],
  unit = colnames(data)[2],
  metaNames = NULL,
  synthetics = 100,
  alpha = 0.05,
  silent = FALSE,
  rGuessSims = 500,
  subsetPlotSize = 25,
  nTrees = 500,
  method = "class"
)

Arguments

data

Data.frame of outcome and predictors. The predictors include groups of variables which are finite projections of a higher dimensional variables as well as single meta-variables.

Any replicate data, i.e. repeated observations, should already be handled. The unit column is needed just to drop data (so pre-removing and giving NULL works). Typically use the results from getKsPCAData, potentially with meta-variables attached.

K

(Optional) Numeric indicating the number of folds to use in K-fold cross-validation. The default is 10.

outcome

(Optional) String indicating the outcome column name in data. Default is the first column of data.

unit

(Optional) String indicating the unit column name in data. Default is the second column of data.

metaNames

(Optional) Vector indicating the meta-variables to be considered. Default is NULL.

synthetics

(Optional) Numeric indicating the number of synthetics for variables (one set of sythethics for functional variables and one for each meta-variable). If 0 are used, the data cannot be aligned properly. Default is 100.

alpha

(Optional) Numeric in (0,1) indicating the significance used throughout the analysis. Default is 0.05.

silent

(Optional) Boolean indicating if output should be suppressed when the function is running. Default is FALSE.

rGuessSims

(Optional) Numeric value indicating the number of simulations used for guessing and creating the guess estimate on the plot. Default is 500.

subsetPlotSize

(Optional) Numeric indicating the number of top variables to include in a subset graph. If this is larger than the total number then no subset graph will be produced. Default is 25.

nTrees

(Optional) Numeric indicating the number of trees to use in the random forest model. Default is 500.

method

(Optional) Method for rpart tree to build random forest. Default is "class". Currently this is the only tested method. This will be expanded in future releases.

Value

List with the following items:

  1. model: The funkyForest Model fit on the entire given data.

  2. VariableImportance: Data.frame with the results of variable importance indices from the models and CV. The columns are var, est, sd, and cvSD.

  3. AccuracyEstimate: Data.frame with model accuracy estimates: out-of-bag accuracy (OOB), biased estimate (bias), and random guess (guess). The columns are OOB, bias, and guess.

  4. NoiseCutoff: Numeric indicating noise cutoff (vertical line).

  5. InterpolationCutoff: Vector of numerics indicating the interpolation cutoff (curved line).

  6. AdditionalParams: List of additional parameters for reference: Alpha and subsetPlotSize.

  7. viPlot: ggplot2 object for vi plot with standardized results. It displays ordered underlying functions and meta-variables with point estimates, sd, noise cutoff, and interpolation cutoff all based on variable importance values.

  8. subset_viPlot: (Optional) ggplot2 object for vi plot with standardized results and only top subsetPlotSize variables. It displays ordered underlying functions and meta-variables with point estimates, sd, noise cutoff, and interpolation cutoff all based on variable importance values.

Examples

# Parameters are reduced beyond recommended levels for speed
fm <- funkyModel(
  data = TNBC[, c(1:8, ncol(TNBC))],
  outcome = "Class", unit = "Person",
  metaNames = c("Age"),
  nTrees = 5, synthetics = 10,
  silent = TRUE
)