R overlap multiple GRanges with findOverlaps()

2019-05-12 17:21发布

I have three tables with differing genomic intervals. Here is an example:

> a
   chr interval.start interval.end names
1 chr1              5           10     a
2 chr1              6           10     b
3 chr2              7           10     c
4 chr3              8           10     d

> b
   chr interval.start interval.end names
1 chr1              6           15     e
2 chr1              7           15     f
3 chr1              8           15     g

> c
   chr interval.start interval.end names
1 chr1              7           12     h
2 chr1              8           12     i
3 chr5              9           12     j
4 chr10             10          12     k
5 chr20             11          12     l

I am trying to find the common intervals between all tables after converting info to GRanges. Essentially I want to do something like intersect(c,intersect(a,b)). However, because I am using genomic coordinates, I have to do this with GRanges and GenomicRanges package, which I am not familiar with.

I can do findOverlaps(gr, gr1) or findOverlaps(gr1, gr2), but is there an easy way to overlap multiple GRanges at once like findOverlaps(gr, gr1, gr2)?

Any help would be appreciated. If this question was asked elsewhere, I apologize in advance.

Thanks

2条回答
手持菜刀,她持情操
2楼-- · 2019-05-12 17:58

You can subset one of them using the subsetByOverlaps result of one pairwise comparison then use that subset to compare to the third set.

Sub1 <- subsetByOverlaps(gr,gr1)
Sub2 <- subsetByOverlaps(sub1,gr2)

Or directly

Reduce(subsetByOverlaps, list(gr, gr1, gr2))

resulting in the subset of the GRanges object that overlap in all 3 GRanges objects

Depending on the type of overlap you want and which has the largest ranges, you should consider which to use as the query and which the subject.

查看更多
Juvenile、少年°
3楼-- · 2019-05-12 18:15

Following works for getting the exact intersects between all the ranges.

Reduce(intersect, list(gr, gr1, gr2))

In:

Reduce(subsetByOverlaps, list(gr, gr1, gr2))

subsetByOverlaps takes the first granges object as the query (first object in parentheses, here gr) and returns the coordiantes in the query (gr) that overlaps with at least one element in the subjects (gr1, gr2). So to find common intervals (regions of intersection), intersect is a the appropriate function.

查看更多
登录 后发表回答