I have my data in a propriety format, None of the ones supported by Apache drill. Are there any tutorial on how to write my own storage plugin to handle such data.
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
- Difference between Types.INTEGER and Types.NULL in
This is something that really should be in the docs but currently is not. The interface isn't too complicated, but it can be a bit much to look at one of the existing plugins and understand everything that is going on.
There are 2 major components to writing a storage plugin, exposing information to the query planner and schema management system and then actually implementing the translation from the datasource API to the drill record representation.
The Kudu plugin was added recently and is a reasonable model for a storage system with a lot of the elements Drill can take advantage of. One thing I would note is that if your storage system is not distributed and you just plan on making all remote reads you don't have to do as much work around affinities/work lists/assignments in the group scan. If I have some time soon I'll try to write up a doc on the different parts of the interface and maybe write a tutorial about one of the existing plugins.
https://github.com/apache/drill/tree/master/contrib/storage-kudu/src/main/java/org/apache/drill/exec/store/kudu