I am working on a solution where I will have a Hadoop cluster with Hive running and I want to send jobs and hive queries from a .NET application to be processed and get notified when they are done. I can't find any solutions for interfacing with Hadoop other than directly from a Java app, is there an API I can access that I am just not finding?
相关问题
- Sorting 3 numbers without branching [closed]
- Graphics.DrawImage() - Throws out of memory except
- Why am I getting UnauthorizedAccessException on th
- 求获取指定qq 资料的方法
- How to know full paths to DLL's from .csproj f
With Hadoop: there is no straight way to connect from C# because Hadoop communication tier is working with java only and is not cross platform. It is probably possible but in very non-trivial ways. I know there is a patch to add Protocol Buffers support for Hadoop but at the moment of writing (Aug 2011) is is not released yet.
With Hive situation is better because Hive has Thrift interface which supports C#. You can download Hive Thrift interfaces and generate C# client on your own but beware that it requires some hacking of generated code. Instead I would recommend you downloading dll from https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll or use Nuget package manager, search for "hive": http://nuget.org/List/Packages/Hive.Sharp.Lib Disclaimer: I'm the author.
Thrift API is also another way for other language to access hdfs and hive
Use Hbase.Net library from https://hbasenet.codeplex.com/
Then you can connect to hbase/hive as shown below:
FYI, we are using hortonworks sandbox and it connects fine.
In above example, 10.20.14.179 is host and 9090 is port.
Also, below might help from https://community.hortonworks.com/questions/25101/is-there-a-way-to-connect-to-hbase-using-c.html
There is no native C# HBase client. however, there are several options for interacting with HBase from C#.
C# HBase Thrift client - Thrift allows for defining service endpoints and data models in a common format and using code generators to create language specific bindings. HBase provides a Thirft server and definitions. There are many examples online for creating a C# HBase Thrift Client.
Marlin - Marlin is a C# client for interacting with Stargate (HBase REST API) that ultimately became hbase-sdk-for-net. I have not personally tested this against HBase 1.x+, but considering it uses Stargate, I expect it should work. If you are planning to use Stargate and implement your own client, which I would recommend over Thrift, make sure to use protobufs to avoid the JSON serialization overhead. Using a HTTP based approach also makes it much easier to load balance requests over multiple gateways.
Phoenix Query Server - Phoenix is a SQL skin on HBase. Phoenix Query Server is a REST API for submitting SQL queries to Phoenix. Here is some example code, however, I have not yet tested it.
Simba HBase ODBC Driver - Using ODBC to connect to HBase. I've heard positive feedback on this approach, especially from tools like Tableau. This is not open source and requires purchasing a license.
It is possible to access Hive utilizing C# by making use of Microsoft's ODBC connector. Download the Nuget package for "Microsoft.Hadoop.Hive" and follow the example provided at http://msdn.microsoft.com/en-us/library/dn749834.aspx
The trick lies in building the connection string to connect with it. The best way I came up with was to download the Microsoft Hive ODBC Driver (http://www.microsoft.com/en-us/download/details.aspx?id=40886), install it, then use the Server Explorer inside Visual Studio to add a new connection, then build the connection string for me. To do this, I used the following steps:
Either utilize this data source or copy the connection string that it's built for you and use it within your application.
Apparently it is possible to connect to Hadoop with non-Java solutions - see Do I have to write my application in Java?
There is Hortonworks ODBC driver. I havn't used it personally, but it shall let you work with hive as with any other ODBC datasource. You can use OdbcConnection class to connect to Hive once ODBC driver is installed.
As noted in other answers - you can use Thrift api. For that you need to generate C# classes from interface definition files, which you can download from Hive source repository. This approach works for me.
You can use IKVM, to convert hadoop client java libraries into .Net assemblies which you can use from C#. I havn't used IKVM with Hive client, but I've IKVMed some other hadoop client library and surprisingly it worked.
EDIT: