-->

Y axes on the logit scale and centered in gbm.plot

2019-07-15 09:07发布

问题:

I am currently exploring the gbm functions in the package dismo to create boosted regression trees for species distribution modeling. I have been using the dismo vignettes as well as the 2008 paper "A working guide to boosted regression trees" by Elith et al., published in the Journal of Animal Ecology. On page 808:809 of the Elith et al. article, the authors explain partial dependence plots and give an example at the bottom of page 809 (Fig. 6). According to the dismo vignette "Boosted Regression Trees for ecological modeling", gbm.plot "Plots the partial dependence of the response on one or more predictors".

Gbm.plot creates plots that look almost exactly like the example in Elith et al.. However, there are a few parameters I cannot figure out how to set to achieve a figure the exact same as in the paper.

  1. The y-axes in the paper are on the logit scale and are centered to have a zero mean over the data distribution. The y-axes in gbm.plot represent the fitted function.

  2. The rug in the paper is on the top of the plots, gbm.step the rug is on the bottom.

  3. Gbm.plot uses the variable name as the x-axis label. The paper has meaningful axis labels.

Here is the figure from the Elith paper compared to one produced with gbm.plot

Figure 6 from Elith et al., 2009

From gbm.plot

My solutions

While looking for solutions I came across this question and it gave me the idea to look at the source code (a first for me). From the source, I was able to get a good idea of how the function is put together, but there is still much I don't understand.

  1. I am not sure what to change to transform the y-axes to the logit scale and center them to have a mean of zero.

  2. I was able to change the source to move the rug to the top of the plots. I found the command for the rug function and added an argument of side=3.

  3. For the variable names, I figure I need to make a list of appropriate variable names, attach it to the data, and somehow read it into the source code. Still over my head.

I will be thankful for any input. I also think that if other ecologists are using the Elith paper to guide them, they may run into the same problem.

Here is an example of the code I ran to produce the plots

gbm.plot(all.sum.tc4.lr001, rug=TRUE, smooth=TRUE, n.plots=9, common.scale=TRUE, write.title = FALSE, show.contrib=TRUE, plot.layout=c(2,3), cex.lab=1.5)

回答1:

This is late, but I can provide a roundabout solution to problem 3: adding custom x-labels to gbm.plot. I'm sure there's a better way but here is what I did. This method is helpful if you have a large dataset and are refining the variables you are using a lot.

Step 1. Locate the dismo package's source code for gbm.plot. Select all the code and create a new script and name the function gbm.plot2. Search for "var.name". Replace any instance where var.name is being changed. Examples:

var.name <- gbm.call$predictor.names[k]
var.name <- x.label 

to this:

var.name <- labels[j]

Now save the script and call it using source(), or run the whole script to get gbm.plot2 into the global environment.

Step 2. Let's pretend our dataframe is called "df" and has 200 columns. Choose the column numbers you want to call in gbm.step.

vars <- c(17, 175, 198)

Step 3. Make a dataframe with two columns: one column will have all the possible variable names you might be interested in using and one with the labels you want to use. Make sure the ColumnNames actually match what you can find in "colnames(df)[vars]".

ColumnNames <- c("HiHorAve", "Elev", "Type5")
Labels <- c("Hi Hello Avenue", "Probably Elevation", "Type 5 of Something")
labels <- data.frame(ColumnNames,Labels)

Now order the labels by the order in which they appear in your dataframe. This is helpful is you have a bunch of variables and your data frame changes shape often.

labels <- labels[match(colnames(df)[vars], labels$ColumnNames),]

Step 4. Run your gbm.step equation like so:

BRTmodel<- gbm.step(data=df, gbm.x=vars, gbm.y = 5, .....)

Step 5. Get the model summary --it orders the variables by relative importance. Then arrange the labels by relative importance.

smry1<- summary(BRTmodel)

labels <- labels[order(match(names(df)[vars],smry1$var))]
labels <- labels$Labels #extract the labels to a vector

Step 6. Now run your new gbm.plot script!

  gbm.plot2(BRTmodel, n.plots=3, y.label="")

It should plot only the nice labels.



回答2:

A quick way to change the x-axis label is to ensure show.contrib is set to FALSE and then use x.label=expression(paste("")). For the y-axis just use y.label= "" Here is an example:

gbm.plot(BRT_Model, variable.no=3, smooth=TRUE,
         common.scale=TRUE, write.title=FALSE, y.label="my y-axis title", 
         x.label=expression(paste("my x-axis title")), 
         show.contrib=FALSE, plot.layout=c(1, 1), cex.lab=1.5, cex.axis=1.5)