I am having trouble reproducing my results exactly when I make predictions with predict() and a saved gbm model object. I am using the raster package to predict to a raster. Each time I run predict() with the same model object and inputs (a raster stack), I get slightly different values (max value within 0.7 for a range of predictions from 0.08 to 12.30 for example). However, there seems to be a limited amount of outcomes. For example, I can get the results to reproduce if I run predict enough times.
The issue seems to be within the predict() function as it doesn't seem to be related to R session, loading package libraries, etc.
Are there any random number generators or known bugs within the predict function that could be causing this behavior?
Here is my code:
library(raster)
library(rgdal)
library(gbm)
# read in the saved gbm model object
final <- readRDS("final_gbm_model")
# get the raster file names from the directory for the predictors
hab = list.files(getwd(), pattern="txt$", full.names=FALSE)
# import as raster stack
rstack <- stack(hab)
# run the predictions. One of my predictors is a factor character that is constant everywhere, so I use the const option:
pred <- predict(rstack, final, n.trees=final$n.trees,na.rm=TRUE,const=data.frame(WaterUse2="H")) # family="gaussian"
#Not sure if I need the family = "gaussian" option. It doesn't seem to affect the results.
# do some model specific back transformation on the predictions
e <- exp(1)
pred <- e^(pred)* 1.506951
#defining the projection for raster
albers<-CRS("+proj=aea +datum=NAD83 +lat_1=29.5 +lat_2=45.5 +lat_0=23.0 +lon_0=-96.0 +x_0=0 +y_0=0")
projection(pred) <- albers
projection(pred, asText = TRUE)
# Export prediction grid to geotiff
writeRaster(pred,filename="preds.tif",format="GTiff",overwrite=TRUE)
Then when I open the raster in ArcGIS, I notice I get slightly different summary stats each time, but not always. Here is a dropbox link to my saved model object and raster stack: https://www.dropbox.com/s/cb5nybhb6ioc8d9/rstack_model.zip?dl=0. They can be read in with readRDS().
When I run gdalUtils::gdalinfo(yourfile) on the rasters created in two different R sessions, with the same model file and inputs here is what I get:
gdalUtils::gdalinfo("preds2_NO3_TD_Pubs.tif")
[1] "Driver: GTiff/GeoTIFF"
[2] "Files: preds2_NO3_TD_Pubs.tif"
[3] "Size is 250, 718"
[4] "Coordinate System is:"
[5] "PROJCS[\"unnamed\","
[6] " GEOGCS[\"NAD83\","
[7] " DATUM[\"North_American_Datum_1983\","
[8] " SPHEROID[\"GRS 1980\",6378137,298.2572221010042,"
[9] " AUTHORITY[\"EPSG\",\"7019\"]],"
[10] " TOWGS84[0,0,0,0,0,0,0],"
[11] " AUTHORITY[\"EPSG\",\"6269\"]],"
[12] " PRIMEM[\"Greenwich\",0],"
[13] " UNIT[\"degree\",0.0174532925199433],"
[14] " AUTHORITY[\"EPSG\",\"4269\"]],"
[15] " PROJECTION[\"Albers_Conic_Equal_Area\"],"
[16] " PARAMETER[\"standard_parallel_1\",29.5],"
[17] " PARAMETER[\"standard_parallel_2\",45.5],"
[18] " PARAMETER[\"latitude_of_center\",23],"
[19] " PARAMETER[\"longitude_of_center\",-96],"
[20] " PARAMETER[\"false_easting\",0],"
[21] " PARAMETER[\"false_northing\",0],"
[22] " UNIT[\"metre\",1,"
[23] " AUTHORITY[\"EPSG\",\"9001\"]]]"
[24] "Origin = (-2256000.000000000000000,2275000.000000000000000)"
[25] "Pixel Size = (1000.000000000000000,-1000.000000000000000)"
[26] "Metadata:"
[27] " AREA_OR_POINT=Area"
[28] "Image Structure Metadata:"
[29] " COMPRESSION=LZW"
[30] " INTERLEAVE=BAND"
[31] "Corner Coordinates:"
[32] "Upper Left (-2256000.000, 2275000.000) (123d14'22.95\"W, 40d33'27.36\"N)"
[33] "Lower Left (-2256000.000, 1557000.000) (121d 0'52.48\"W, 34d23'17.23\"N)"
[34] "Upper Right (-2006000.000, 2275000.000) (120d21'32.84\"W, 41d 9'22.70\"N)"
[35] "Lower Right (-2006000.000, 1557000.000) (118d20'58.37\"W, 34d56'21.98\"N)"
[36] "Center (-2131000.000, 1916000.000) (120d42' 6.60\"W, 37d46'25.81\"N)"
[37] "Band 1 Block=250x4 Type=Float64, ColorInterp=Gray"
[38] " Min=0.077 Max=11.509 "
[39] " Minimum=0.077, Maximum=11.509, Mean=2.150, StdDev=1.373"
[40] " NoData Value=-1.6999999999999999e+308"
[41] " Metadata:"
[42] " STATISTICS_MAXIMUM=11.508599723736"
[43] " STATISTICS_MEAN=2.1504374380146"
[44] " STATISTICS_MINIMUM=0.077388949363546"
[45] " STATISTICS_STDDEV=1.3731923433979"
and another run
gdalUtils::gdalinfo("preds3_NO3_TD_Pubs.tif")
[1] "Driver: GTiff/GeoTIFF"
[2] "Files: preds3_NO3_TD_Pubs.tif"
[3] "Size is 250, 718"
[4] "Coordinate System is:"
[5] "PROJCS[\"unnamed\","
[6] " GEOGCS[\"NAD83\","
[7] " DATUM[\"North_American_Datum_1983\","
[8] " SPHEROID[\"GRS 1980\",6378137,298.2572221010042,"
[9] " AUTHORITY[\"EPSG\",\"7019\"]],"
[10] " TOWGS84[0,0,0,0,0,0,0],"
[11] " AUTHORITY[\"EPSG\",\"6269\"]],"
[12] " PRIMEM[\"Greenwich\",0],"
[13] " UNIT[\"degree\",0.0174532925199433],"
[14] " AUTHORITY[\"EPSG\",\"4269\"]],"
[15] " PROJECTION[\"Albers_Conic_Equal_Area\"],"
[16] " PARAMETER[\"standard_parallel_1\",29.5],"
[17] " PARAMETER[\"standard_parallel_2\",45.5],"
[18] " PARAMETER[\"latitude_of_center\",23],"
[19] " PARAMETER[\"longitude_of_center\",-96],"
[20] " PARAMETER[\"false_easting\",0],"
[21] " PARAMETER[\"false_northing\",0],"
[22] " UNIT[\"metre\",1,"
[23] " AUTHORITY[\"EPSG\",\"9001\"]]]"
[24] "Origin = (-2256000.000000000000000,2275000.000000000000000)"
[25] "Pixel Size = (1000.000000000000000,-1000.000000000000000)"
[26] "Metadata:"
[27] " AREA_OR_POINT=Area"
[28] "Image Structure Metadata:"
[29] " COMPRESSION=LZW"
[30] " INTERLEAVE=BAND"
[31] "Corner Coordinates:"
[32] "Upper Left (-2256000.000, 2275000.000) (123d14'22.95\"W, 40d33'27.36\"N)"
[33] "Lower Left (-2256000.000, 1557000.000) (121d 0'52.48\"W, 34d23'17.23\"N)"
[34] "Upper Right (-2006000.000, 2275000.000) (120d21'32.84\"W, 41d 9'22.70\"N)"
[35] "Lower Right (-2006000.000, 1557000.000) (118d20'58.37\"W, 34d56'21.98\"N)"
[36] "Center (-2131000.000, 1916000.000) (120d42' 6.60\"W, 37d46'25.81\"N)"
[37] "Band 1 Block=250x4 Type=Float64, ColorInterp=Gray"
[38] " Min=0.077 Max=11.509 "
[39] " Minimum=0.077, Maximum=11.509, Mean=2.202, StdDev=1.389"
[40] " NoData Value=-1.6999999999999999e+308"
[41] " Metadata:"
[42] " STATISTICS_MAXIMUM=11.508599723736"
[43] " STATISTICS_MEAN=2.2022304968129"
[44] " STATISTICS_MINIMUM=0.077388949363546"
[45] " STATISTICS_STDDEV=1.3888495428978"
and yet another run:
gdalUtils::gdalinfo("preds4_NO3_TD_Pubs.tif")
[1] "Driver: GTiff/GeoTIFF"
[2] "Files: preds4_NO3_TD_Pubs.tif"
[3] "Size is 250, 718"
[4] "Coordinate System is:"
[5] "PROJCS[\"unnamed\","
[6] " GEOGCS[\"NAD83\","
[7] " DATUM[\"North_American_Datum_1983\","
[8] " SPHEROID[\"GRS 1980\",6378137,298.2572221010042,"
[9] " AUTHORITY[\"EPSG\",\"7019\"]],"
[10] " TOWGS84[0,0,0,0,0,0,0],"
[11] " AUTHORITY[\"EPSG\",\"6269\"]],"
[12] " PRIMEM[\"Greenwich\",0],"
[13] " UNIT[\"degree\",0.0174532925199433],"
[14] " AUTHORITY[\"EPSG\",\"4269\"]],"
[15] " PROJECTION[\"Albers_Conic_Equal_Area\"],"
[16] " PARAMETER[\"standard_parallel_1\",29.5],"
[17] " PARAMETER[\"standard_parallel_2\",45.5],"
[18] " PARAMETER[\"latitude_of_center\",23],"
[19] " PARAMETER[\"longitude_of_center\",-96],"
[20] " PARAMETER[\"false_easting\",0],"
[21] " PARAMETER[\"false_northing\",0],"
[22] " UNIT[\"metre\",1,"
[23] " AUTHORITY[\"EPSG\",\"9001\"]]]"
[24] "Origin = (-2256000.000000000000000,2275000.000000000000000)"
[25] "Pixel Size = (1000.000000000000000,-1000.000000000000000)"
[26] "Metadata:"
[27] " AREA_OR_POINT=Area"
[28] "Image Structure Metadata:"
[29] " COMPRESSION=LZW"
[30] " INTERLEAVE=BAND"
[31] "Corner Coordinates:"
[32] "Upper Left (-2256000.000, 2275000.000) (123d14'22.95\"W, 40d33'27.36\"N)"
[33] "Lower Left (-2256000.000, 1557000.000) (121d 0'52.48\"W, 34d23'17.23\"N)"
[34] "Upper Right (-2006000.000, 2275000.000) (120d21'32.84\"W, 41d 9'22.70\"N)"
[35] "Lower Right (-2006000.000, 1557000.000) (118d20'58.37\"W, 34d56'21.98\"N)"
[36] "Center (-2131000.000, 1916000.000) (120d42' 6.60\"W, 37d46'25.81\"N)"
[37] "Band 1 Block=250x4 Type=Float64, ColorInterp=Gray"
[38] " Min=0.077 Max=11.509 "
[39] " Minimum=0.077, Maximum=11.509, Mean=2.202, StdDev=1.389"
[40] " NoData Value=-1.6999999999999999e+308"
[41] " Metadata:"
[42] " STATISTICS_MAXIMUM=11.508599723736"
[43] " STATISTICS_MEAN=2.2021866233701"
[44] " STATISTICS_MINIMUM=0.077388949363546"
[45] " STATISTICS_STDDEV=1.3888576145887"
summing the final raster values (after the back transform) gives several different numbers:
ras1<- raster("preds2_NO3_TD_Pubs.tif")
ras2<- raster("preds3_NO3_TD_Pubs.tif")
ras3<- raster("preds4_NO3_TD_Pubs.tif")
sum(getValues(ras1),na.rm=TRUE)
[1] 103750
sum(getValues(ras2),na.rm=TRUE)
[1] 106248.8
sum(getValues(ras3),na.rm=TRUE)
[1] 106246.7
For some reason I am no longer getting the output with a maximum of 12.445, but that had been a value I got in previous runs:
gdalUtils::gdalinfo("preds_NO3_TD_Pubs.tif")
[1] "Driver: GTiff/GeoTIFF"
[2] "Files: preds_NO3_TD_Pubs.tif"
[3] " preds_NO3_TD_Pubs.tif.aux.xml"
[4] "Size is 250, 718"
[5] "Coordinate System is:"
[6] "PROJCS[\"unnamed\","
[7] " GEOGCS[\"NAD83\","
[8] " DATUM[\"North_American_Datum_1983\","
[9] " SPHEROID[\"GRS 1980\",6378137,298.2572221010042,"
[10] " AUTHORITY[\"EPSG\",\"7019\"]],"
[11] " TOWGS84[0,0,0,0,0,0,0],"
[12] " AUTHORITY[\"EPSG\",\"6269\"]],"
[13] " PRIMEM[\"Greenwich\",0],"
[14] " UNIT[\"degree\",0.0174532925199433],"
[15] " AUTHORITY[\"EPSG\",\"4269\"]],"
[16] " PROJECTION[\"Albers_Conic_Equal_Area\"],"
[17] " PARAMETER[\"standard_parallel_1\",29.5],"
[18] " PARAMETER[\"standard_parallel_2\",45.5],"
[19] " PARAMETER[\"latitude_of_center\",23],"
[20] " PARAMETER[\"longitude_of_center\",-96],"
[21] " PARAMETER[\"false_easting\",0],"
[22] " PARAMETER[\"false_northing\",0],"
[23] " UNIT[\"metre\",1,"
[24] " AUTHORITY[\"EPSG\",\"9001\"]]]"
[25] "Origin = (-2256000.000000000000000,2275000.000000000000000)"
[26] "Pixel Size = (1000.000000000000000,-1000.000000000000000)"
[27] "Metadata:"
[28] " AREA_OR_POINT=Area"
[29] "Image Structure Metadata:"
[30] " COMPRESSION=LZW"
[31] " INTERLEAVE=BAND"
[32] "Corner Coordinates:"
[33] "Upper Left (-2256000.000, 2275000.000) (123d14'22.95\"W, 40d33'27.36\"N)"
[34] "Lower Left (-2256000.000, 1557000.000) (121d 0'52.48\"W, 34d23'17.23\"N)"
[35] "Upper Right (-2006000.000, 2275000.000) (120d21'32.84\"W, 41d 9'22.70\"N)"
[36] "Lower Right (-2006000.000, 1557000.000) (118d20'58.37\"W, 34d56'21.98\"N)"
[37] "Center (-2131000.000, 1916000.000) (120d42' 6.60\"W, 37d46'25.81\"N)"
[38] "Band 1 Block=250x4 Type=Float64, ColorInterp=Gray"
[39] " Min=0.077 Max=12.445 "
[40] " Minimum=0.077, Maximum=12.445, Mean=2.253, StdDev=1.420"
[41] " NoData Value=-1.6999999999999999e+308"
[42] " Metadata:"
[43] " STATISTICS_MAXIMUM=12.445399168904"
[44] " STATISTICS_MEAN=2.2528155395026"
[45] " STATISTICS_MINIMUM=0.077388949363546"
[46] " STATISTICS_STDDEV=1.4199528164987"
and here is the value sum:
sum(getValues(raster("preds_NO3_TD_Pubs.tif")),na.rm=TRUE)
[1] 108689.3