R-Tree Implementation Java

2019-02-12 00:05发布

问题:

I was searching the last few days for a stable implementation of the R-Tree with support of unlimited dimensions (20 or so would be enough). I only found this http://sourceforge.net/projects/jsi/ but they only support 2 dimensions.

Another Option would be a multidimensional implementation of an interval-tree.

Maybe I'm completly wrong with the idea of using an R-Tree or Intervall-Tree for my Problem so i state the Problem in short, that you can send me your thoughts about this.

The Problem I need to solve is some kind of nearest-neighbour search. I have a set of Antennas and rooms and for each antenna an interval of Integers. E.g. antenna 1, min -92, max -85. In fact it could be represented as room -> set of antennas -> interval for antenna. The idea was that each room spans a box in the R-Tree over the dimension of the antennas and in each dimension by the interval.

If I get a query with N-Antennas and values for each antenna I then could just represent the Information as a query point in the room and retrieve the rooms "nearest" to the point.

Hope you got an Idea of the problem and my idea.

回答1:

I'm not entirely clear on what your exact problem is, but an R-Tree or interval tree would not work well in 20 dimensions. That's not a huge number of dimensions, but it is large enough for the curse of dimensionality to begin showing up.

To see what I mean, consider just trying to look at all of the neighbors of a box, including ones off of corners and edges. With 20 dimensions, you'll have 320 - 1 or 3,486,784,400 neighboring boxes. (You get that by realizing that along each axis a neighbor can be -1 unit, 0 unit, or +1 unit, but (0,0,0) is not a neighbor because it represents the original box.)

I'm sorry, but you either need to accept brute force searching, or else analyze your problem better and come up with a cleverer solution.



回答2:

Be aware that R-Trees can degrade badly when you have discrete data. The first thing you really need to find out is an appropriate data representation, then test if your queries work on a subset of the data.

R-Trees will only make your queries faster. If they don't work in the first place, it will not help. You should test your approach without using R-Trees first. Unless you hit a large amount of data (say, 100.000 objects), a linear scan in-memory can easily outperform an R-Tree, in particular when you need some adapter layer because it is not well-intergrated with your code.

The obvious approach here is to just use bounding rectangles, and linearly scan over them. If they work, you can then store the MBRs in an R-Tree to get some performance improvements. But if it doesn't work with a linear scan, it won't work with an R-Tree either (it will not work faster.)



回答3:

I have found this R*-Tree implementation in Java which seems to offer many features:

https://github.com/davidmoten/rtree

You might want to check it out!



回答4:

Another good implementation in Java is ELKI: https://elki-project.github.io/.



回答5:

You can use PostgreSQL’s Generalized Search Tree indexing facility.

GiST Quick demo