Difference between Pig and Hive? Why have both? [c

My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS (PDF link).

I understand that-

Pig's language Pig Latin is a shift from(suits the way programmers think) SQL like declarative style of programming and Hive's query language closely resembles SQL.
Pig sits on top of Hadoop and in principle can also sit on top of Dryad. I might be wrong but Hive is closely coupled to Hadoop.
Both Pig Latin and Hive commands compiles to Map and Reduce jobs.

My question - What is the goal of having both when one (say Pig) could serve the purpose. Is it just because Pig is evangelized by Yahoo! and Hive by Facebook?

标签： hadoop hive apache-pig

19条回答

等我变得足够好

2楼-- · 2019-01-09 21:02

Check out this post from Alan Gates, Pig architect at Yahoo!, that compares when would use a SQL like Hive rather than Pig. He makes a very convincing case as to the usefulness of a procedural language like Pig (vs. declarative SQL) and its utility to dataflow designers.

0人赞添加讨论(0) 举报

我只想做你的唯一

3楼-- · 2019-01-09 21:02

Read the difference between PIG and HIVE in this link.

http://www.aptibook.com/Articles/Pig-and-hive-advantages-disadvantages-features

All the aspects are given. If you are in the confusion which to choose then you must see that web page.

0人赞添加讨论(0) 举报

Fickle 薄情

4楼-- · 2019-01-09 21:03

Pig is useful for ETL kind of workloads generally speaking. For example set of transformations you need to do to your data every day.

Hive shines when you need to run adhoc queries or just want to explore data. It sometimes can act as interface to your visualisation Layer ( Tableau/Qlikview).

Both are essential and serve different purpose.

0人赞添加讨论(0) 举报

迷人小祖宗

5楼-- · 2019-01-09 21:05

What HIVE can do which is not possible in PIG?

Partitioning can be done using HIVE but not in PIG, it is a way of bypassing the output.

What PIG can do which is not possible in HIVE?

Positional referencing - Even when you dont have field names, we can reference using the position like $0 - for first field, $1 for second and so on.

And another fundamental difference is, PIG doesn't need a schema to write the values but HIVE does need a schema.

You can connect from any external application to HIVE using JDBC and others but not with PIG.

Note: Both runs on top of HDFS (hadoop distributed file system) and the statements are converted to Map Reduce programs.

0人赞添加讨论(0) 举报

女痞

6楼-- · 2019-01-09 21:06

Have a look at Pig Vs Hive Comparison in a nut shell from a "dezyre" article

Hive is better than PIG in: Partitions, Server, Web interface & JDBC/ODBC support.

Some differences:

Hive is best for structured Data & PIG is best for semi structured data
Hive is used for reporting & PIG for programming
Hive is used as a declarative SQL & PIG as a procedural language
Hive supports partitions & PIG does not
Hive can start an optional thrift based server & PIG cannot
Hive defines tables beforehand (schema) + stores schema information in a database & PIG doesn't have a dedicated metadata of database
Hive does not support Avro but PIG does. EDIT: Hive supports Avro, specify the serde as org.apache.hadoop.hive.serde2.avro
Pig also supports additional COGROUP feature for performing outer joins but hive does not. But both Hive & PIG can join, order & sort dynamically.

0人赞添加讨论(0) 举报

The star\"

7楼-- · 2019-01-09 21:07

You can achieve similar results with pig/hive queries. The main difference lies within approach to understanding/writing/creating queries.

Pig tends to create a flow of data: small steps where in each you do some processing
Hive gives you SQL-like language to operate on your data, so transformation from RDBMS is much easier (Pig can be easier for someone who had not earlier experience with SQL)

It is also worth noting, that for Hive you can nice interface to work with this data (Beeswax for HUE, or Hive web interface), and it also gives you metastore for information about your data (schema, etc) which is useful as a central information about your data.

I use both Hive and Pig, for different queries (I use that one where I can write query faster/easier, I do it this way mostly ad-hoc queries) - they can use the same data as an input. But currently I'm doing much of my work through Beeswax.

0人赞添加讨论(0) 举报

1 2 3 4 下一页

Difference between Pig and Hive? Why have both? [c

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间