Why hive is not supporting Stored procedure? If its not supporting then how we will handle Sp in Hive? have any alternate solution? (Because we have a already a data base is there in mssql) What about HBASE? Is it support SP?
问题:
回答1:
First of all, Hadoop or Hive is NOT
an alternative to your SQL DB. You must never consider either of these 2 to be used as a replacement of your RDBMS.
Hive was developed just to provide warehousing capabilities on top of an existing Hadoop cluster keeping in mind the large base of SQL users, both expert database designers and administrators, as well as casual users who use SQL to extract information from their data warehouses. Although it provides you a SQL like interface, it is not a SQL DB. Hive is most suited for data warehouse applications, where relatively static data is analyzed, fast response times are not required, and when the data is not changing rapidly. Simply put for offline batch processing
kind of stuff.
There is nothing like stored procedures in HBase as well. But they have something called as Coprocessor
which resembles stored procedures in RDBMS. To find more on Coprocessor you can go here.
And as @zsxwing has said Sqoop is just a data migration tool, nothing more. Once you switch to the NoSQL world you need to be flexible and you need to abide by the NoSQL rules.
If you could elaborate your use case a bit, maybe we can help you better.
In response to your comment :
Yes Facebook uses Hadoop and Hive and other related tool extensively. Infact Hive was developed at Facebook. But These are not the only things. Wherever they have OLTP and full transactional need, they still depend on RDBMS. One example is their Timeline
feature, which uses MySQL. They have a gigantic(and awesome) pipeline which consists of a lot of things and not just Hadoop and Hive. See the picture below.
回答2:
Hive and Hbase are not support stored procedure. However, Hive plans to support Sp (HIVE-3087) in the future. HBase has no plan about supporting Sp since it only focuses on being a Storage and more like NoSQL.
Hive UDF could implement some function of stored procedure, though it's not enough.
回答3:
Hive does not have stored procedures
Hive indeed does not have any stored procedures as explained in existing answers. However, here are 2 mitigating factors:
Hive has views
Of course it is not a proper substitute for stored procedures, but with smart use of views you can perhaps remove the need for some of your procedures.
You can call hive from another program
The last time I ran into the problem that hive does not have stored procedures, I realized that the thing I wanted to do (loop over all columns) was something that I could also do in another program. As such I followed the following workflow:
- Run a query to get the relevant (meta) data: Python calls hive to get column names
- Use the information to build the query: Python takes in all column names and builds the correspondng select statements
- Run the resulting query: Python does a system call with
hive -e
- Optionally, go to 2 if needed
With views and external calls, I have so far been able to work around the lack of stored procedures.
回答4:
Please refer to HPL/SQL, I am looking for same solution but not try yet.
I believe the data warehouse application need stored procedure support, but prefer set-based than row-based procedure.
In my personal experience, procedural support is needed when leverage server-side program template in structured data warehouse application. It makes data warehouse application more easy to porting between SQL/NoSQL, like Netezza, MSSQL, Oracle, DB2, and BigInsight.
回答5:
Have a look at open-source project PL/HQL at http://www.plhql.org. It allows you to run existing SQL Server, Oracle, Teradata, MySQL etc. stored procedures in Hive.