Sqoop import. how many max mapper could be execute

2020-02-13 04:45发布

How many max number of mapper could be executed in Sqoop import. Also, while importing using sqoop is there any case where reducer is running.

标签: sqoop
3条回答
爱情/是我丢掉的垃圾
2楼-- · 2020-02-13 05:25

Max Number of mappers

It can be any number, but it should be set based on data, resource and desired parallelism. More mapper is does not mean more performance.

is there any case where reducer is running

Yes - there are special circumstances, when sqoop job may have reducer.

One such condition is documented here.

sqoop export \
    -Dmapred.reduce.tasks=2
    -Dpgbulkload.bin="/usr/local/bin/pg_bulkload" \
    -Dpgbulkload.input.field.delim=$'\t' \
    -Dpgbulkload.check.constraints="YES" \
    -Dpgbulkload.parse.errors="INFINITE" \
    -Dpgbulkload.duplicate.errors="INFINITE" \
    --connect jdbc:postgresql://pgsql.example.net:5432/sqooptest \
    --connection-manager org.apache.sqoop.manager.PGBulkloadManager \
    --table test --username sqooptest --export-dir=/test -m 2

mapred.reduce.tasks - Number of reduce tasks for staging. The default value is 1. Each tasks do staging in a single transaction.

查看更多
Juvenile、少年°
3楼-- · 2020-02-13 05:35

1.How many max number of mapper could be executed in Sqoop import?

Increasing the number of mappers will lead to a higher number of concurrent data transfer tasks, 'which can' result in faster job completion.

It won’t always lead to faster job completion. While increasing the number of mappers, there is a point at which you will fully saturate your database. Increasing the number of mappers beyond this point won’t lead to faster job completion; in fact, it will have the opposite effect as your database server spends more time doing context switching rather than serving data.

The optimal number of mappers depends on many variables:

1.Database type.

2.Hardware that is used for your database server.

  1. Impact to other requests that your database needs to serve.

    Start with small number of mappers for your to find the optimal degree of parallelism for your environment and use case.

2.Also, while importing using sqoop is there any case where reducer is running.

Reducers are needed for aggregation.Number of reducers for sqoop is 0, since it is merely a job running a MAP only job that dumps data into HDFS. We are not aggregating anything.

查看更多
叼着烟拽天下
4楼-- · 2020-02-13 05:38

Sqoop jobs use 4 map tasks by default. It can be modified by passing either -m or --num-mappers argument to the job. There is no maximum limit on number of mappers set by Sqoop, but the total number of concurrent connections to the database is a factor to consider. Read more about Controlling Parallelism in Sqoop here.

If the table does not have a Primary Key defined and the --split-by argument is not provided to the sqoop command, the number of mappers should be explicitly set to 1.

Sqoop jobs do not have any reduce task.

查看更多
登录 后发表回答