Sqoop import. how many max mapper could be execute

How many max number of mapper could be executed in Sqoop import. Also, while importing using sqoop is there any case where reducer is running.

标签： sqoop

3条回答

爱情/是我丢掉的垃圾

2楼-- · 2020-02-13 05:25

Max Number of mappers

It can be any number, but it should be set based on data, resource and desired parallelism. More mapper is does not mean more performance.

is there any case where reducer is running

Yes - there are special circumstances, when sqoop job may have reducer.

One such condition is documented here.

sqoop export \
    -Dmapred.reduce.tasks=2
    -Dpgbulkload.bin="/usr/local/bin/pg_bulkload" \
    -Dpgbulkload.input.field.delim=$'\t' \
    -Dpgbulkload.check.constraints="YES" \
    -Dpgbulkload.parse.errors="INFINITE" \
    -Dpgbulkload.duplicate.errors="INFINITE" \
    --connect jdbc:postgresql://pgsql.example.net:5432/sqooptest \
    --connection-manager org.apache.sqoop.manager.PGBulkloadManager \
    --table test --username sqooptest --export-dir=/test -m 2

mapred.reduce.tasks - Number of reduce tasks for staging. The default value is 1. Each tasks do staging in a single transaction.

0人赞添加讨论(0) 举报

Juvenile、少年°

3楼-- · 2020-02-13 05:35

1.How many max number of mapper could be executed in Sqoop import?

Increasing the number of mappers will lead to a higher number of concurrent data transfer tasks, 'which can' result in faster job completion.

It won’t always lead to faster job completion. While increasing the number of mappers, there is a point at which you will fully saturate your database. Increasing the number of mappers beyond this point won’t lead to faster job completion; in fact, it will have the opposite effect as your database server spends more time doing context switching rather than serving data.

The optimal number of mappers depends on many variables:

1.Database type.

2.Hardware that is used for your database server.

Impact to other requests that your database needs to serve.

Start with small number of mappers for your to find the optimal degree of parallelism for your environment and use case.

2.Also, while importing using sqoop is there any case where reducer is running.

Reducers are needed for aggregation.Number of reducers for sqoop is 0, since it is merely a job running a MAP only job that dumps data into HDFS. We are not aggregating anything.

0人赞添加讨论(0) 举报

叼着烟拽天下

4楼-- · 2020-02-13 05:38

Sqoop jobs use 4 map tasks by default. It can be modified by passing either -m or --num-mappers argument to the job. There is no maximum limit on number of mappers set by Sqoop, but the total number of concurrent connections to the database is a factor to consider. Read more about Controlling Parallelism in Sqoop here.

If the table does not have a Primary Key defined and the --split-by argument is not provided to the sqoop command, the number of mappers should be explicitly set to 1.

Sqoop jobs do not have any reduce task.

0人赞添加讨论(0) 举报

Sqoop import. how many max mapper could be execute

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间