Difference between Pig and Hive? Why have both? [c-第3页回答

My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS (PDF link).

I understand that-

Pig's language Pig Latin is a shift from(suits the way programmers think) SQL like declarative style of programming and Hive's query language closely resembles SQL.
Pig sits on top of Hadoop and in principle can also sit on top of Dryad. I might be wrong but Hive is closely coupled to Hadoop.
Both Pig Latin and Hive commands compiles to Map and Reduce jobs.

My question - What is the goal of having both when one (say Pig) could serve the purpose. Is it just because Pig is evangelized by Yahoo! and Hive by Facebook?

标签： hadoop hive apache-pig

19条回答

放我归山

2楼-- · 2019-01-09 21:16

Pig eats anything! Meaning it can consume unstructured data.

Hive requires a schema.

0人赞添加讨论(0) 举报

Luminary・发光体

3楼-- · 2019-01-09 21:18

From the link: http://www.aptibook.com/discuss-technical?uid=tech-hive4&question=What-kind-of-datawarehouse-application-is-suitable-for-Hive?

Hive is not a full database. The design constraints and limitations of Hadoop and HDFS impose limits on what Hive can do.

Hive is most suited for data warehouse applications, where

1) Relatively static data is analyzed,

2) Fast response times are not required, and

3) When the data is not changing rapidly.

Hive doesn’t provide crucial features required for OLTP, Online Transaction Processing. It’s closer to being an OLAP tool, Online Analytic Processing. So, Hive is best suited for data warehouse applications, where a large data set is maintained and mined for insights, reports, etc.

0人赞添加讨论(0) 举报

ら.Afraid

4楼-- · 2019-01-09 21:21

Pig-latin is data flow style, is more suitable for software engineer. While sql is more suitable for analytics person who are get used to sql. For complex task, for hive you have to manually to create temporary table to store intermediate data, but it is not necessary for pig.
Pig-latin is suitable for complicated data structure( like small graph). There's a data structure in pig called DataBag which is a collection of Tuple. Sometimes you need to calculate metrics which involve multiple tuples ( there's a hidden link between tuples, in this case I would call it graph). In this case, it is very easy to write a UDF to calculate the metrics which involve multiple tuples. Of course it could be done in hive, but it is not so convenient as it is in pig.
Writing UDF in pig much is easier than in Hive in my opinion.
Pig has no metadata support, (or it is optional, in future it may integrate hcatalog). Hive has tables' metadata stored in database.
You can debug pig script in local environment, but it would be hard for hive to do that. The reason is point 3. You need to set up hive metadata in your local environment, very time consuming.

0人赞添加讨论(0) 举报

孤傲高冷的网名

5楼-- · 2019-01-09 21:22

Here are some additional links on to use Pig or Hive.

http://aws.amazon.com/elasticmapreduce/faqs/#hive-8

http://www.larsgeorge.com/2009/10/hive-vs-pig.html

0人赞添加讨论(0) 举报

SAY GOODBYE

6楼-- · 2019-01-09 21:23

Hive Vs Pig-

Hive is as SQL interface which allows sql savvy users or Other tools like Tableu/Microstrategy/any other tool or language that has sql interface..

PIG is more like a ETL pipeline..with step by step commands like declaring variables, looping, iterating , conditional statements etc.

I prefer writing Pig scripts over hive QL when I want to write complex step by step logic. When I am comfortable writing a single sql for pulling the data i want i use Hive. for hive you will need to define table before querying(as you do in RDBMS)

The purpose of both are different but under the hood, both do the same, convert to map reduce programs.Also the Apache open source community is add more and more features to both there projects

0人赞添加讨论(0) 举报

The star\"

7楼-- · 2019-01-09 21:24

When we are using Hadoop in the sense it means we are trying to huge data processing The end goal of the data processing would be to generate content/reports out of it.

So it internally consists of 2 prime activities 1) Loading Data Processing 2) Generate content and use it for the reporting /etc..

Loading /Data Procesing -> Pig would be helpful in it. This helps as an ETL (We can perform etl operations using pig scripts.) Once the result is processed we can use hive to generate the reports based on the processed result.

Hive:Its built on top of hdfs for warehouse processing. WE can geenerate adhoc reports easily using hive from the processed content generated from pig.

0人赞添加讨论(0) 举报

Difference between Pig and Hive? Why have both? [c

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间