Merge overlapping ranges into unique groups, in da

2019-01-11 20:18发布

问题:

I have a dataframe of n rows and 3 columns

df
  start   end group
1   178  5025     1
2   400  5025     1
3   983  5535     2
4  1932  6918     2
5 33653 38197     3

I would like to make a new column df$group2 that re-classifies groups that overlap to be the same. For example, df$group[df$group==1] starts at 178 and ends at 5025. This overlaps with df$group[df$group==2], which starts at 983 and ends at 6918. I would like to make a new column that now classifies group 1 and 2 as group 1 (and subsequently, group 3 as group 2).

Result:

df
  start   end group group2
1   178  5025     1      1
2   400  5025     1      1
3   983  5535     2      1
4  1932  6918     2      1
5 33653 38197     3      2

Thanks in advance for any help.

回答1:

You'll need IRanges package:

require(IRanges)
ir <- IRanges(df$start, df$end)
df$group2 <- subjectHits(findOverlaps(ir, reduce(ir)))
> df

#  start   end group group2
# 1   178  5025     1      1
# 2   400  5025     1      1
# 3   983  5535     2      1
# 4  1932  6918     2      1
# 5 33653 38197     3      2

To install IRanges, type these lines in R:

source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")

To learn more (manual etc..) go here