I trained data from 500 devices to predict their performance. Then I applied my trained model to a test data set for another 500 devices and show pretty good prediction results. Now my executives want me to prove this model will work well on one million devices not only on 500. Obviously we don't have data for one million devices. And if the model is not reliable, they want me to discover the required amount of train data in order to make a reliable prediction on one million devices. How should I deal with these executives who don't have a background in statistical analysis and modeling? Any suggestions? Thanks
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
I have suggested to @cep to write up his comment as an answer - including providing the variance
and bias
calculations. In any case it could be added
"Do not be quick to assume Execs are essentially incapable in terms of technical or mathematical concepts"
While there may be Dilbert
managers out there .. somewhere I have seen few of them myself. More often managers get to their positions through hard work. They are likely to be rusty - but the abilities are likely still there.
In this case whether or not they have a "background in statistical analysis and modeling" they are applying common sense.
The first thing you might do is to provide the proper context and terminology. @cel has mentioned some of it: providing concrete values for :
- assumptions
- what assumptions are you making about the data.
- What basis is there to consider extrapolation of the limited data
- why should said extrapoated results be trusted to apply to the 99.5% of untested data
- data distribution
- basic descriptive statistics
- your take on the apriori distribution of the data. Justify why you chose it
- modeling
- which models/approaches were considered and why
- which model you actually chose and why
- how did you arrive at the hyperparameters
- how you trained the model
- results
- statistical measures of fit and error rate