Reproducibility of results from predict() function

2019-06-06 08:36发布

I am having trouble reproducing my results exactly when I make predictions with predict() and a saved gbm model object. I am using the raster package to predict to a raster. Each time I run predict() with the same model object and inputs (a raster stack), I get slightly different values (max value within 0.7 for a range of predictions from 0.08 to 12.30 for example). However, there seems to be a limited amount of outcomes. For example, I can get the results to reproduce if I run predict enough times.

The issue seems to be within the predict() function as it doesn't seem to be related to R session, loading package libraries, etc.

Are there any random number generators or known bugs within the predict function that could be causing this behavior?

Here is my code:

library(raster)
library(rgdal)
library(gbm) 

# read in the saved gbm model object
final <- readRDS("final_gbm_model")

# get the raster file names from the directory for the predictors
hab = list.files(getwd(), pattern="txt$", full.names=FALSE)

# import as raster stack
rstack <- stack(hab)

# run the predictions. One of my predictors is a factor character that is constant everywhere, so I use the const option:
pred <- predict(rstack, final, n.trees=final$n.trees,na.rm=TRUE,const=data.frame(WaterUse2="H"))  # family="gaussian"
#Not sure if I need the family = "gaussian" option. It doesn't seem to affect the results. 

# do some model specific back transformation on the predictions
e <- exp(1)
pred <- e^(pred)* 1.506951

#defining the  projection for raster
albers<-CRS("+proj=aea +datum=NAD83 +lat_1=29.5 +lat_2=45.5 +lat_0=23.0      +lon_0=-96.0 +x_0=0 +y_0=0")
projection(pred) <- albers
projection(pred, asText = TRUE)

# Export prediction grid to geotiff
writeRaster(pred,filename="preds.tif",format="GTiff",overwrite=TRUE)

Then when I open the raster in ArcGIS, I notice I get slightly different summary stats each time, but not always. Here is a dropbox link to my saved model object and raster stack: https://www.dropbox.com/s/cb5nybhb6ioc8d9/rstack_model.zip?dl=0. They can be read in with readRDS().

When I run gdalUtils::gdalinfo(yourfile) on the rasters created in two different R sessions, with the same model file and inputs here is what I get:

gdalUtils::gdalinfo("preds2_NO3_TD_Pubs.tif")

 [1] "Driver: GTiff/GeoTIFF"                                                    
 [2] "Files: preds2_NO3_TD_Pubs.tif"                                            
 [3] "Size is 250, 718"                                                         
 [4] "Coordinate System is:"                                                    
 [5] "PROJCS[\"unnamed\","                                                      
 [6] "    GEOGCS[\"NAD83\","                                                    
 [7] "        DATUM[\"North_American_Datum_1983\","                             
 [8] "            SPHEROID[\"GRS 1980\",6378137,298.2572221010042,"             
 [9] "                AUTHORITY[\"EPSG\",\"7019\"]],"                           
[10] "            TOWGS84[0,0,0,0,0,0,0],"                                      
[11] "            AUTHORITY[\"EPSG\",\"6269\"]],"                               
[12] "        PRIMEM[\"Greenwich\",0],"                                         
[13] "        UNIT[\"degree\",0.0174532925199433],"                             
[14] "        AUTHORITY[\"EPSG\",\"4269\"]],"                                   
[15] "    PROJECTION[\"Albers_Conic_Equal_Area\"],"                             
[16] "    PARAMETER[\"standard_parallel_1\",29.5],"                             
[17] "    PARAMETER[\"standard_parallel_2\",45.5],"                             
[18] "    PARAMETER[\"latitude_of_center\",23],"                                
[19] "    PARAMETER[\"longitude_of_center\",-96],"                              
[20] "    PARAMETER[\"false_easting\",0],"                                      
[21] "    PARAMETER[\"false_northing\",0],"                                     
[22] "    UNIT[\"metre\",1,"                                                    
[23] "        AUTHORITY[\"EPSG\",\"9001\"]]]"                                   
[24] "Origin = (-2256000.000000000000000,2275000.000000000000000)"              
[25] "Pixel Size = (1000.000000000000000,-1000.000000000000000)"                
[26] "Metadata:"                                                                
[27] "  AREA_OR_POINT=Area"                                                     
[28] "Image Structure Metadata:"                                                
[29] "  COMPRESSION=LZW"                                                        
[30] "  INTERLEAVE=BAND"                                                        
[31] "Corner Coordinates:"                                                      
[32] "Upper Left  (-2256000.000, 2275000.000) (123d14'22.95\"W, 40d33'27.36\"N)"
[33] "Lower Left  (-2256000.000, 1557000.000) (121d 0'52.48\"W, 34d23'17.23\"N)"
[34] "Upper Right (-2006000.000, 2275000.000) (120d21'32.84\"W, 41d 9'22.70\"N)"
[35] "Lower Right (-2006000.000, 1557000.000) (118d20'58.37\"W, 34d56'21.98\"N)"
[36] "Center      (-2131000.000, 1916000.000) (120d42' 6.60\"W, 37d46'25.81\"N)"
[37] "Band 1 Block=250x4 Type=Float64, ColorInterp=Gray"                        
[38] "  Min=0.077 Max=11.509 "                                                  
[39] "  Minimum=0.077, Maximum=11.509, Mean=2.150, StdDev=1.373"                
[40] "  NoData Value=-1.6999999999999999e+308"                                  
[41] "  Metadata:"                                                              
[42] "    STATISTICS_MAXIMUM=11.508599723736"                                   
[43] "    STATISTICS_MEAN=2.1504374380146"                                      
[44] "    STATISTICS_MINIMUM=0.077388949363546"                                 
[45] "    STATISTICS_STDDEV=1.3731923433979"   

and another run

 gdalUtils::gdalinfo("preds3_NO3_TD_Pubs.tif")
 [1] "Driver: GTiff/GeoTIFF"                                                    
 [2] "Files: preds3_NO3_TD_Pubs.tif"                                            
 [3] "Size is 250, 718"                                                         
 [4] "Coordinate System is:"                                                    
 [5] "PROJCS[\"unnamed\","                                                      
 [6] "    GEOGCS[\"NAD83\","                                                    
 [7] "        DATUM[\"North_American_Datum_1983\","                             
 [8] "            SPHEROID[\"GRS 1980\",6378137,298.2572221010042,"             
 [9] "                AUTHORITY[\"EPSG\",\"7019\"]],"                           
[10] "            TOWGS84[0,0,0,0,0,0,0],"                                      
[11] "            AUTHORITY[\"EPSG\",\"6269\"]],"                               
[12] "        PRIMEM[\"Greenwich\",0],"                                         
[13] "        UNIT[\"degree\",0.0174532925199433],"                             
[14] "        AUTHORITY[\"EPSG\",\"4269\"]],"                                   
[15] "    PROJECTION[\"Albers_Conic_Equal_Area\"],"                             
[16] "    PARAMETER[\"standard_parallel_1\",29.5],"                             
[17] "    PARAMETER[\"standard_parallel_2\",45.5],"                             
[18] "    PARAMETER[\"latitude_of_center\",23],"                                
[19] "    PARAMETER[\"longitude_of_center\",-96],"                              
[20] "    PARAMETER[\"false_easting\",0],"                                      
[21] "    PARAMETER[\"false_northing\",0],"                                     
[22] "    UNIT[\"metre\",1,"                                                    
[23] "        AUTHORITY[\"EPSG\",\"9001\"]]]"                                   
[24] "Origin = (-2256000.000000000000000,2275000.000000000000000)"              
[25] "Pixel Size = (1000.000000000000000,-1000.000000000000000)"                
[26] "Metadata:"                                                                
[27] "  AREA_OR_POINT=Area"                                                     
[28] "Image Structure Metadata:"                                                
[29] "  COMPRESSION=LZW"                                                        
[30] "  INTERLEAVE=BAND"                                                        
[31] "Corner Coordinates:"                                                      
[32] "Upper Left  (-2256000.000, 2275000.000) (123d14'22.95\"W, 40d33'27.36\"N)"
[33] "Lower Left  (-2256000.000, 1557000.000) (121d 0'52.48\"W, 34d23'17.23\"N)"
[34] "Upper Right (-2006000.000, 2275000.000) (120d21'32.84\"W, 41d 9'22.70\"N)"
[35] "Lower Right (-2006000.000, 1557000.000) (118d20'58.37\"W, 34d56'21.98\"N)"
[36] "Center      (-2131000.000, 1916000.000) (120d42' 6.60\"W, 37d46'25.81\"N)"
[37] "Band 1 Block=250x4 Type=Float64, ColorInterp=Gray"                        
[38] "  Min=0.077 Max=11.509 "                                                  
[39] "  Minimum=0.077, Maximum=11.509, Mean=2.202, StdDev=1.389"                
[40] "  NoData Value=-1.6999999999999999e+308"                                  
[41] "  Metadata:"                                                              
[42] "    STATISTICS_MAXIMUM=11.508599723736"                                   
[43] "    STATISTICS_MEAN=2.2022304968129"                                      
[44] "    STATISTICS_MINIMUM=0.077388949363546"                                 
[45] "    STATISTICS_STDDEV=1.3888495428978"   

and yet another run:

 gdalUtils::gdalinfo("preds4_NO3_TD_Pubs.tif")
 [1] "Driver: GTiff/GeoTIFF"                                                    
 [2] "Files: preds4_NO3_TD_Pubs.tif"                                            
 [3] "Size is 250, 718"                                                         
 [4] "Coordinate System is:"                                                    
 [5] "PROJCS[\"unnamed\","                                                      
 [6] "    GEOGCS[\"NAD83\","                                                    
 [7] "        DATUM[\"North_American_Datum_1983\","                             
 [8] "            SPHEROID[\"GRS 1980\",6378137,298.2572221010042,"             
 [9] "                AUTHORITY[\"EPSG\",\"7019\"]],"                           
[10] "            TOWGS84[0,0,0,0,0,0,0],"                                      
[11] "            AUTHORITY[\"EPSG\",\"6269\"]],"                               
[12] "        PRIMEM[\"Greenwich\",0],"                                         
[13] "        UNIT[\"degree\",0.0174532925199433],"                             
[14] "        AUTHORITY[\"EPSG\",\"4269\"]],"                                   
[15] "    PROJECTION[\"Albers_Conic_Equal_Area\"],"                             
[16] "    PARAMETER[\"standard_parallel_1\",29.5],"                             
[17] "    PARAMETER[\"standard_parallel_2\",45.5],"                             
[18] "    PARAMETER[\"latitude_of_center\",23],"                                
[19] "    PARAMETER[\"longitude_of_center\",-96],"                              
[20] "    PARAMETER[\"false_easting\",0],"                                      
[21] "    PARAMETER[\"false_northing\",0],"                                     
[22] "    UNIT[\"metre\",1,"                                                    
[23] "        AUTHORITY[\"EPSG\",\"9001\"]]]"                                   
[24] "Origin = (-2256000.000000000000000,2275000.000000000000000)"              
[25] "Pixel Size = (1000.000000000000000,-1000.000000000000000)"                
[26] "Metadata:"                                                                
[27] "  AREA_OR_POINT=Area"                                                     
[28] "Image Structure Metadata:"                                                
[29] "  COMPRESSION=LZW"                                                        
[30] "  INTERLEAVE=BAND"                                                        
[31] "Corner Coordinates:"                                                      
[32] "Upper Left  (-2256000.000, 2275000.000) (123d14'22.95\"W, 40d33'27.36\"N)"
[33] "Lower Left  (-2256000.000, 1557000.000) (121d 0'52.48\"W, 34d23'17.23\"N)"
[34] "Upper Right (-2006000.000, 2275000.000) (120d21'32.84\"W, 41d 9'22.70\"N)"
[35] "Lower Right (-2006000.000, 1557000.000) (118d20'58.37\"W, 34d56'21.98\"N)"
[36] "Center      (-2131000.000, 1916000.000) (120d42' 6.60\"W, 37d46'25.81\"N)"
[37] "Band 1 Block=250x4 Type=Float64, ColorInterp=Gray"                        
[38] "  Min=0.077 Max=11.509 "                                                  
[39] "  Minimum=0.077, Maximum=11.509, Mean=2.202, StdDev=1.389"                
[40] "  NoData Value=-1.6999999999999999e+308"                                  
[41] "  Metadata:"                                                              
[42] "    STATISTICS_MAXIMUM=11.508599723736"                                   
[43] "    STATISTICS_MEAN=2.2021866233701"                                      
[44] "    STATISTICS_MINIMUM=0.077388949363546"                                 
[45] "    STATISTICS_STDDEV=1.3888576145887"  

summing the final raster values (after the back transform) gives several different numbers:

ras1<- raster("preds2_NO3_TD_Pubs.tif")
ras2<- raster("preds3_NO3_TD_Pubs.tif")
ras3<- raster("preds4_NO3_TD_Pubs.tif")

sum(getValues(ras1),na.rm=TRUE)
[1] 103750
sum(getValues(ras2),na.rm=TRUE)
[1] 106248.8
sum(getValues(ras3),na.rm=TRUE)
[1] 106246.7 

For some reason I am no longer getting the output with a maximum of 12.445, but that had been a value I got in previous runs:

gdalUtils::gdalinfo("preds_NO3_TD_Pubs.tif")

 [1] "Driver: GTiff/GeoTIFF"                                                    
 [2] "Files: preds_NO3_TD_Pubs.tif"                                             
 [3] "       preds_NO3_TD_Pubs.tif.aux.xml"                                     
 [4] "Size is 250, 718"                                                         
 [5] "Coordinate System is:"                                                    
 [6] "PROJCS[\"unnamed\","                                                      
 [7] "    GEOGCS[\"NAD83\","                                                    
 [8] "        DATUM[\"North_American_Datum_1983\","                             
 [9] "            SPHEROID[\"GRS 1980\",6378137,298.2572221010042,"             
[10] "                AUTHORITY[\"EPSG\",\"7019\"]],"                           
[11] "            TOWGS84[0,0,0,0,0,0,0],"                                      
[12] "            AUTHORITY[\"EPSG\",\"6269\"]],"                               
[13] "        PRIMEM[\"Greenwich\",0],"                                         
[14] "        UNIT[\"degree\",0.0174532925199433],"                             
[15] "        AUTHORITY[\"EPSG\",\"4269\"]],"                                   
[16] "    PROJECTION[\"Albers_Conic_Equal_Area\"],"                             
[17] "    PARAMETER[\"standard_parallel_1\",29.5],"                             
[18] "    PARAMETER[\"standard_parallel_2\",45.5],"                             
[19] "    PARAMETER[\"latitude_of_center\",23],"                                
[20] "    PARAMETER[\"longitude_of_center\",-96],"                              
[21] "    PARAMETER[\"false_easting\",0],"                                      
[22] "    PARAMETER[\"false_northing\",0],"                                     
[23] "    UNIT[\"metre\",1,"                                                    
[24] "        AUTHORITY[\"EPSG\",\"9001\"]]]"                                   
[25] "Origin = (-2256000.000000000000000,2275000.000000000000000)"              
[26] "Pixel Size = (1000.000000000000000,-1000.000000000000000)"                
[27] "Metadata:"                                                                
[28] "  AREA_OR_POINT=Area"                                                     
[29] "Image Structure Metadata:"                                                
[30] "  COMPRESSION=LZW"                                                        
[31] "  INTERLEAVE=BAND"                                                        
[32] "Corner Coordinates:"                                                      
[33] "Upper Left  (-2256000.000, 2275000.000) (123d14'22.95\"W, 40d33'27.36\"N)"
[34] "Lower Left  (-2256000.000, 1557000.000) (121d 0'52.48\"W, 34d23'17.23\"N)"
[35] "Upper Right (-2006000.000, 2275000.000) (120d21'32.84\"W, 41d 9'22.70\"N)"
[36] "Lower Right (-2006000.000, 1557000.000) (118d20'58.37\"W, 34d56'21.98\"N)"
[37] "Center      (-2131000.000, 1916000.000) (120d42' 6.60\"W, 37d46'25.81\"N)"
[38] "Band 1 Block=250x4 Type=Float64, ColorInterp=Gray"                        
[39] "  Min=0.077 Max=12.445 "                                                  
[40] "  Minimum=0.077, Maximum=12.445, Mean=2.253, StdDev=1.420"                
[41] "  NoData Value=-1.6999999999999999e+308"                                  
[42] "  Metadata:"                                                              
[43] "    STATISTICS_MAXIMUM=12.445399168904"                                   
[44] "    STATISTICS_MEAN=2.2528155395026"                                      
[45] "    STATISTICS_MINIMUM=0.077388949363546"                                 
[46] "    STATISTICS_STDDEV=1.4199528164987"  

and here is the value sum:

sum(getValues(raster("preds_NO3_TD_Pubs.tif")),na.rm=TRUE)
[1] 108689.3

标签: r r-raster gbm
0条回答
登录 后发表回答