How to connect to Hadoop/Hive from .NET

2019-03-15 06:35发布

I am working on a solution where I will have a Hadoop cluster with Hive running and I want to send jobs and hive queries from a .NET application to be processed and get notified when they are done. I can't find any solutions for interfacing with Hadoop other than directly from a Java app, is there an API I can access that I am just not finding?

标签: c# hadoop hive
7条回答
Fickle 薄情
2楼-- · 2019-03-15 06:57

With Hadoop: there is no straight way to connect from C# because Hadoop communication tier is working with java only and is not cross platform. It is probably possible but in very non-trivial ways. I know there is a patch to add Protocol Buffers support for Hadoop but at the moment of writing (Aug 2011) is is not released yet.

With Hive situation is better because Hive has Thrift interface which supports C#. You can download Hive Thrift interfaces and generate C# client on your own but beware that it requires some hacking of generated code. Instead I would recommend you downloading dll from https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll or use Nuget package manager, search for "hive": http://nuget.org/List/Packages/Hive.Sharp.Lib Disclaimer: I'm the author.

查看更多
冷血范
3楼-- · 2019-03-15 06:58

Thrift API is also another way for other language to access hdfs and hive

查看更多
爷、活的狠高调
4楼-- · 2019-03-15 07:03

Use Hbase.Net library from https://hbasenet.codeplex.com/

Then you can connect to hbase/hive as shown below:

        Client c = new Client("10.20.14.179", 9090, 1000000);

        var cli = c.TotalClients;

        var tableList = c.GetTableNames();

FYI, we are using hortonworks sandbox and it connects fine.

In above example, 10.20.14.179 is host and 9090 is port.

Also, below might help from https://community.hortonworks.com/questions/25101/is-there-a-way-to-connect-to-hbase-using-c.html

There is no native C# HBase client. however, there are several options for interacting with HBase from C#.

  1. C# HBase Thrift client - Thrift allows for defining service endpoints and data models in a common format and using code generators to create language specific bindings. HBase provides a Thirft server and definitions. There are many examples online for creating a C# HBase Thrift Client.

  2. Marlin - Marlin is a C# client for interacting with Stargate (HBase REST API) that ultimately became hbase-sdk-for-net. I have not personally tested this against HBase 1.x+, but considering it uses Stargate, I expect it should work. If you are planning to use Stargate and implement your own client, which I would recommend over Thrift, make sure to use protobufs to avoid the JSON serialization overhead. Using a HTTP based approach also makes it much easier to load balance requests over multiple gateways.

  3. Phoenix Query Server - Phoenix is a SQL skin on HBase. Phoenix Query Server is a REST API for submitting SQL queries to Phoenix. Here is some example code, however, I have not yet tested it.

  4. Simba HBase ODBC Driver - Using ODBC to connect to HBase. I've heard positive feedback on this approach, especially from tools like Tableau. This is not open source and requires purchasing a license.

查看更多
唯我独甜
5楼-- · 2019-03-15 07:04

It is possible to access Hive utilizing C# by making use of Microsoft's ODBC connector. Download the Nuget package for "Microsoft.Hadoop.Hive" and follow the example provided at http://msdn.microsoft.com/en-us/library/dn749834.aspx

The trick lies in building the connection string to connect with it. The best way I came up with was to download the Microsoft Hive ODBC Driver (http://www.microsoft.com/en-us/download/details.aspx?id=40886), install it, then use the Server Explorer inside Visual Studio to add a new connection, then build the connection string for me. To do this, I used the following steps:

  • Change the data source to "Microsoft ODBC Data Source" and ensure you're using the ".NET Framework Data Provider for ODBC" as the data provider.

Change Data Source Dialog Window

  • Under the "Data source specification" portion, check the "Use connection string" then click the "Build" button.

Add Connection Dialog Window

  • Under the "Machine Data Source" tab, select the "Sample Microsoft Hive DSN" data source name, then click the "OK" button.

Select Data Source Dialog Window

  • A window titled "Microsoft Hive ODBC Driver Connection Dialog" will open. Enter an optional description, then type in the path to your Hive server, the port you will be using, and what database it should connect to. Indicate the Hive Server Type, and specify an authentication mechanism to use, then fill out the appropriate fields.

Microsoft Hive ODBC Driver Connection Dialog Window

  • Finally, click the "Test" button in the bottom to ensure that you're able to successfully connect. If successful, click the "OK" button, then you'll be back in the "Modify Connection" window. Enter the login information for your Hive service here.

Either utilize this data source or copy the connection string that it's built for you and use it within your application.

查看更多
Lonely孤独者°
6楼-- · 2019-03-15 07:13

Apparently it is possible to connect to Hadoop with non-Java solutions - see Do I have to write my application in Java?

查看更多
对你真心纯属浪费
7楼-- · 2019-03-15 07:14
  1. There is Hortonworks ODBC driver. I havn't used it personally, but it shall let you work with hive as with any other ODBC datasource. You can use OdbcConnection class to connect to Hive once ODBC driver is installed.

  2. As noted in other answers - you can use Thrift api. For that you need to generate C# classes from interface definition files, which you can download from Hive source repository. This approach works for me.

  3. You can use IKVM, to convert hadoop client java libraries into .Net assemblies which you can use from C#. I havn't used IKVM with Hive client, but I've IKVMed some other hadoop client library and surprisingly it worked.

EDIT:

  1. There's also Apache templeton, which allows submitting Hive jobs (Pig and MR also) using Rest interface. The problem with it is that it spawns another map task to submit Hive job, which makes it slower.
查看更多
登录 后发表回答