段错误中的R用reshape2包和dcast(segfault in R using reshape

2019-07-21 00:59发布

RStudio被崩溃时,我试图重塑使用特定数据帧dcast (从reshape2包)。 我发现,坠机实际上R中本身发生的事情,所以我跑R.app我铸造代码,并得到了错误的,让这个网站的名称类型: Error: segfault from C stack overflow 。 随着谷歌的帮助,所以,我才知道,这是一个内存访问错误。

好吧,我得到了那么远,但我不知道在哪里可以从这里走。 我不能提供一个真正可重复的例子,因为我的数据帧是约558000行和小玩具的例子不会发生问题。 例如,即使我走,说,数据的50,000行子集, dcast工作得很好。 莫不是这是造成问题的数据的特定行? 如果是这样,任何人都可以提出什么功能(S),以寻找那些可能会导致错误的我收到的类型?

下面是数据帧我是从铸件(用假值一些变量)的子集,接着我使用的铸造功能。 我还包含了数据在这个小片段dput以下功能,万一这将有助于发挥与它周围。 真实的数据集具有大约700的值prog ,15个值prog1 ,以及5个值fa.type

  id        term   yr    nslds acad.lev    prog            prog1 fa.type amount
1  1   Fall 2009 2010 Graduate Graduate  loan 1      Other Loans    Loan   5000
2  1 Spring 2010 2010 Graduate Graduate  loan 1      Other Loans    Loan   5000
3  2   Fall 2009 2010 Graduate Graduate  loan 2    Stafford Loan    Loan   8781
4  2 Spring 2010 2010 Graduate Graduate  loan 2    Stafford Loan    Loan   8781
5  3   Fall 2007 2008 Graduate Graduate  loan 3    Stafford Loan    Loan   4250
6  3   Fall 2007 2008 Graduate Graduate grant 1 University Grant   Grant   1707

fa.wide = dcast(id + term + yr + nslds + acad.lev ~ prog1 + fa.type , data=fa, value.var="amount", fun.aggregate=sum)

fa = structure(list(id = c(1, 1, 2, 2, 3, 3), term = structure(c(7L, 
8L, 7L, 8L, 1L, 1L), .Label = c("Fall 2007", "Spring 2008", "Summer 2008", 
"Fall 2008", "Spring 2009", "Summer 2009", "Fall 2009", "Spring 2010", 
"Summer 2010", "Fall 2010", "Spring 2011", "Summer 2011", "Fall 2011", 
"Spring 2012", "Summer 2012", "Fall 2012", "Spring 2013"), class = c("ordered", 
"factor")), yr = c(2010L, 2010L, 2010L, 2010L, 2008L, 2008L), 
    nslds = structure(c(7L, 7L, 7L, 7L, 7L, 7L), .Label = c("1st Year, Never Attended", 
    "1st Year, Previously Attended", "2nd Year", "3rd Year", 
    "4th Year", "5th Year+", "Graduate"), class = c("ordered", 
    "factor")), acad.lev = structure(c(6L, 6L, 6L, 6L, 6L, 6L
    ), .Label = c("Freshman", "Sophomore", "Junior", "Senior", 
    "PB Undergrad", "Graduate"), class = c("ordered", "factor"
    )), prog = c("loan 1", "loan 1", "loan 2", "loan 2", "loan 3", 
    "grant 1"), prog1 = c("Other Loans", "Other Loans", "Stafford Loan", 
    "Stafford Loan", "Stafford Loan", "University Grant"), fa.type = structure(c(3L, 
    3L, 3L, 3L, 3L, 2L), .Label = c("Athletic", "Grant", "Loan", 
    "Scholarship", "Waiver", "Work/Study"), class = "factor"), 
    amount = c(5000, 5000, 8781, 8781, 4250, 1707)), .Names = c("id", 
"term", "yr", "nslds", "acad.lev", "prog", "prog1", "fa.type", 
"amount"), row.names = c(NA, 6L), class = "data.frame")

Answer 1:

这不是一个答案,但一个简单的(无意义的),可重复的,将不适合在评论例子。 您可以创建与这个简单的例子,这个错误(在我的MacBookPro)。

require(reshape2)
n = 1448
df <- data.frame( Student = rep( 1:n , each = 2 ) , Grade = sample( 100 , n*2 , repl = TRUE ) )
df2 <- dcast( df , Student ~ Student , value.var = "Grade" , sum )
Error: segfault from C stack overflow

在边界处发生了错误n = 1448 ,即,不会发生当n=1447和下面。 看来,错误是来自split_indicessplit-numeric.c从包装plyr 。 这可能与该分组级别的数量分配给(无符号?)整数值的事实做,如果组数越过32767它会导致一个内存访问错误,但TBH我救命稻草,现在抓着。

sessionInfo()的情况下,任何人都无法重现此错误是:

R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reshape2_1.2.2

loaded via a namespace (and not attached):
[1] plyr_1.8      stringr_0.6.2

有趣的是,如果我运行df2 <-命令再次获得第一个错误后,R彻底崩溃了,我得到一些操作系统产生的错误报告。 我在这里包括了崩溃日志的相关部分:

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_PROTECTION_FAILURE at 0x00007fff5f3ff120

VM Regions Near 0x7fff5f3ff120:
    JS JIT generated code  00004d431a401000-00004d431a402000 [    4K] ---/rwx SM=NUL  
--> STACK GUARD            00007fff5bc00000-00007fff5f400000 [ 56.0M] ---/rwx SM=NUL  stack guard for thread 0
    Stack                  00007fff5f400000-00007fff5fc00000 [ 8192K] rw-/rwx SM=COW  thread 0

Application Specific Information:
objc[57147]: garbage collection is OFF

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_c.dylib               0x00007fff897c4632 small_free_scan_madvise_free + 41
1   libsystem_c.dylib               0x00007fff897c5f06 szone_free_definite_size + 4186
2   libsystem_c.dylib               0x00007fff897fe789 free + 194
3   libR.dylib                      0x0000000100222dbf R_gc_internal + 7327 (memory.c:952)
4   libR.dylib                      0x0000000100224919 Rf_allocVector + 841 (memory.c:2356)
5   plyr.so                         0x000000010144bd2c split_indices + 204 (split-numeric.c:23)
6   libR.dylib                      0x00000001001b4cc7 do_dotcall + 16311 (dotcode.c:593)
7   libR.dylib                      0x00000001001e4448 Rf_eval + 1672 (eval.c:494)
8   libR.dylib                      0x00000001001e5edd do_begin + 141 (eval.c:1415)
9   libR.dylib                      0x00000001001e429c Rf_eval + 1244 (eval.c:468)
10  libR.dylib                      0x00000001001e93b1 Rf_applyClosure + 849 (eval.c:861)
11  libR.dylib                      0x00000001001e41b2 Rf_eval + 1010 (eval.c:512)
12  libR.dylib                      0x00000001001e74e5 do_set + 709 (eval.c:1717)
13  libR.dylib                      0x00000001001e429c Rf_eval + 1244 (eval.c:468)
14  libR.dylib                      0x00000001001e5edd do_begin + 141 (eval.c:1415)
15  libR.dylib                      0x00000001001e429c Rf_eval + 1244 (eval.c:468)
16  libR.dylib                      0x00000001001e93b1 Rf_applyClosure + 849 (eval.c:861)
17  libR.dylib                      0x00000001001e41b2 Rf_eval + 1010 (eval.c:512)
18  libR.dylib                      0x00000001001e74e5 do_set + 709 (eval.c:1717)
19  libR.dylib                      0x00000001001e429c Rf_eval + 1244 (eval.c:468)
20  libR.dylib                      0x00000001001e5edd do_begin + 141 (eval.c:1415)
21  libR.dylib                      0x00000001001e429c Rf_eval + 1244 (eval.c:468)
22  libR.dylib                      0x00000001001e429c Rf_eval + 1244 (eval.c:468)
23  libR.dylib                      0x00000001001e5edd do_begin + 141 (eval.c:1415)
24  libR.dylib                      0x00000001001e429c Rf_eval + 1244 (eval.c:468)
25  libR.dylib                      0x00000001001e93b1 Rf_applyClosure + 849 (eval.c:861)
26  libR.dylib                      0x00000001001e41b2 Rf_eval + 1010 (eval.c:512)
27  libR.dylib                      0x00000001001e74e5 do_set + 709 (eval.c:1717)
28  libR.dylib                      0x00000001001e429c Rf_eval + 1244 (eval.c:468)
29  libR.dylib                      0x00000001001e5edd do_begin + 141 (eval.c:1415)
30  libR.dylib                      0x00000001001e429c Rf_eval + 1244 (eval.c:468)
31  libR.dylib                      0x00000001001e93b1 Rf_applyClosure + 849 (eval.c:861)
32  libR.dylib                      0x00000001001e41b2 Rf_eval + 1010 (eval.c:512)
33  libR.dylib                      0x00000001001e74e5 do_set + 709 (eval.c:1717)
34  libR.dylib                      0x00000001001e429c Rf_eval + 1244 (eval.c:468)
35  libR.dylib                      0x000000010021c761 R_ReplDLLdo1 + 481 (main.c:362)
36  org.R-project.R                 0x0000000100022c24 run_REngineRmainloop + 196
37  org.R-project.R                 0x00000001000159b7 -[REngine runREPL] + 119
38  org.R-project.R                 0x0000000100001f24 main + 852
39  org.R-project.R                 0x0000000100001914 start + 52


Answer 2:

我具有枢转一个长表,以宽一个在包reshape2使用dcast一个同样的问题。 我发现在这个岗位解决方案plyr split_indices功能崩溃的长向量 。 具体来说,您可以下载该页面中的split_numeric.c和环路apply.c https://github.com/hadley/plyr/tree/master/src 。 卸载来自R控制台包plyr,最后在本地重新安装软件包:install.packages(“/路径/到/源”,回购= NULL,类型=“源”)。

这解决了我的问题,希望它帮助。



文章来源: segfault in R using reshape2 package and dcast