How to make a great R reproducible example

2020-01-22 07:28发布

When discussing performance with colleagues, teaching, sending a bug report or searching for guidance on mailing lists and here on Stack Overflow, a reproducible example is often asked and always helpful.

What are your tips for creating an excellent example? How do you paste data structures from in a text format? What other information should you include?

Are there other tricks in addition to using dput(), dump() or structure()? When should you include library() or require() statements? Which reserved words should one avoid, in addition to c, df, data, etc.?

How does one make a great reproducible example?

标签: r r-faq
23条回答
成全新的幸福
2楼-- · 2020-01-22 07:48

If you have one or more factor variable(s) in your data that you want to make reproducible with dput(head(mydata)), consider adding droplevels to it, so that levels of factors that are not present in the minimized data set are not included in your dput output, in order to make the example minimal:

dput(droplevels(head(mydata)))
查看更多
ゆ 、 Hurt°
3楼-- · 2020-01-22 07:48

I wonder if an http://old.r-fiddle.org/ link could be a very neat way of sharing a problem. It receives a unique ID like and one could even think about embedding it in SO.

查看更多
欢心
4楼-- · 2020-01-22 07:49

It's a good idea to use functions from the testthat package to show what you expect to occur. Thus, other people can alter your code until it runs without error. This eases the burden of those who would like to help you, because it means they don't have to decode your textual description. For example

library(testthat)
# code defining x and y
if (y >= 10) {
    expect_equal(x, 1.23)
} else {
    expect_equal(x, 3.21)
}

is clearer than "I think x would come out to be 1.23 for y equal to or exceeding 10, and 3.21 otherwise, but I got neither result". Even in this silly example, I think the code is clearer than the words. Using testthat lets your helper focus on the code, which saves time, and it provides a way for them to know they have solved your problem, before they post it

查看更多
太酷不给撩
5楼-- · 2020-01-22 07:50

You can do this using reprex.

As mt1022 noted, "... good package for producing minimal, reproducible example is "reprex" from tidyverse".

According to Tidyverse:

The goal of "reprex" is to package your problematic code in such a way that other people can run it and feel your pain.

An example is given on tidyverse web site.

library(reprex)
y <- 1:4
mean(y)
reprex() 

I think this is the simplest way to create a reproducible example.

查看更多
神经病院院长
6楼-- · 2020-01-22 07:54

Since R.2.14 (I guess) you can feed your data text representation directly to read.table:

 df <- read.table(header=TRUE, 
  text="Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
") 
查看更多
爱情/是我丢掉的垃圾
7楼-- · 2020-01-22 07:55

Here are some of my suggestions:

  • Try to use default R datasets
  • If you have your own dataset, include them with dput, so others can help you more easily
  • Do not use install.package() unless it is really necessary, people will understand if you just use require or library
  • Try to be concise,

    • Have some dataset
    • Try to describe the output you need as simply as possible
    • Do it yourself before you ask the question
  • It is easy to upload an image, so upload plots if you have
  • Also include any errors you may have

All these are part of a reproducible example.

查看更多
登录 后发表回答