How to change sqoop metastore?

2019-01-23 21:15发布

问题:

I am using sqoop 1.4.2 version. I am trying to change the sqoop metastore from default hsqldb to mysql.

I have configured following properties in sqoop-site.xml file.

    <property>
    <name>sqoop.metastore.client.enable.autoconnect</name>
    <value>false</value>
    <description>If true, Sqoop will connect to a local metastore
      for job management when no other metastore arguments are
      provided.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.url</name>
    <value>jdbc:mysql://ip:3206/sqoop?createDatabaseIfNotExist=true</value>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.username</name>
    <value>userName</value>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.password</name>
    <value>password</value>
  </property>
</configuration>

When I try to create a sqoop jobs with meta-connect url it fails to connect to configured mysql db.

sqoop job --create --meta-connect {mysql_jdbc_url} sqoop job defination

it is throwing following exception.

    14/06/06 15:04:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.0.6.1-101
14/06/06 15:04:55 WARN hsqldb.HsqldbJobStorage: Could not interpret as a number: null
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: Can not interpret metadata schema
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: The metadata schema version is null
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: The highest version supported is 0
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: To use this version of Sqoop, you must downgrade your metadata schema.
14/06/06 15:04:55 ERROR tool.JobTool: I/O error performing job operation: java.io.IOException: Invalid metadata version.
        at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:202)
        at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.open(HsqldbJobStorage.java:161)
        at org.apache.sqoop.tool.JobTool.run(JobTool.java:274)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:240)

Does sqoop 1.4.2 supports metastore other than hsql db?

Please suggest.

回答1:

The answer is Yes, in my case I am using PostgreSQL. I ran into this recently and I am using Version 1.4.4. I am not sure if what I did is the recommended way, but it works. Here are the steps I followed

  1. In sqoop-site.xml I configured it with, the connect string to my database, username and password.

  2. Created the following object in the database, as Sqoop was failing at it.

    CREATE TABLE SQOOP_ROOT ( version INT, propname VARCHAR(128) NOT NULL, propval VARCHAR(256), CONSTRAINT SQOOP_ROOT_unq UNIQUE (version, propname) );

  3. Inserted the following row (This seems to be the reason your script is failing)

    INSERT INTO SQOOP_ROOT VALUES( NULL, 'sqoop.hsqldb.job.storage.version', '0' );

I think the correct way might be is to download the source, and extend org.apache.sqoop.metastore.JobStorage with you DB implementation.



回答2:

Sqoop metastore does not support any other database other hsqldb. Number 2 points of notes on the link. cloudera



回答3:

Public service announcement: Sqoop Metastore on other DBs may fail

We have been able to get PostgreSQL and MySQL working as targets for the Sqoop Metastore on Sqoop 1, replacing the HyperSQL database. There's a little setup and seeding of the database needed, but from then on, it seemed fine.

However, we are seeing cases when we are running many sqoop jobs, updating the metastore concurrently -- sqoop 1.4.6 has no code to trap and handle cases where metastore updates for incremental updates fail due to concurrency issues. In particular, Sqoop _will complete it's import successfully but not update the metastore with the most recently imported values. This will cause the next incremental run will import duplicate data. Sqoop will return a non-zero return code, but data in either Hadoop or the metastore need to be synced afterward in order for data to be correct.

We're not sure there is a solution, but this is an expansion of @SandeerKumar's answer. This may be an issue with HyperSQL as well, but it would be much less likely because HSQL is in memory, so faster.