How sqoop converts sql datatypes into hive

2019-05-21 06:25发布

问题:

I need complete list of SQL Server datatypes converted to hive in sqoop.

Like SQL Server's timestamp is converted to hive string, SQL Server varchar is converted to hive string, etc.

Where can I find this complete mapping?

回答1:

For Java mapping

SELECT * FROM <TABLE-NAME> WHERE 1=0

Query is used. It will fetch no records but metadata.

Useful source code:

  protected String getColNamesQuery(String tableName) {
    // adding where clause to prevent loading a big table
    return "SELECT t.* FROM " + escapeTableName(tableName) + " AS t WHERE 1=0";
  }

Then it uses ResulSetMetadata to find datatypes.

Useful source code for generic JDBC,

  int cols = results.getMetaData().getColumnCount();
  ArrayList<String> columns = new ArrayList<String>();
  ResultSetMetaData metadata = results.getMetaData();
  for (int i = 1; i < cols + 1; i++) {
    String colName = metadata.getColumnLabel(i);
    if (colName == null || colName.equals("")) {
      colName = metadata.getColumnName(i);
      if (null == colName) {
        colName = "_RESULT_" + i;
      }
    }
    columns.add(colName);
    LOG.debug("Found column " + colName);

Mapping from --map-column-java is overridden on this mapping.


For Hive Mapping

This mapping is used:

 public static String toHiveType(int sqlType) {

      switch (sqlType) {
          case Types.INTEGER:
          case Types.SMALLINT:
              return "INT";
          case Types.VARCHAR:
          case Types.CHAR:
          case Types.LONGVARCHAR:
          case Types.NVARCHAR:
          case Types.NCHAR:
          case Types.LONGNVARCHAR:
          case Types.DATE:
          case Types.TIME:
          case Types.TIMESTAMP:
          case Types.CLOB:
              return "STRING";
          case Types.NUMERIC:
          case Types.DECIMAL:
          case Types.FLOAT:
          case Types.DOUBLE:
          case Types.REAL:
              return "DOUBLE";
          case Types.BIT:
          case Types.BOOLEAN:
              return "BOOLEAN";
          case Types.TINYINT:
              return "TINYINT";
          case Types.BIGINT:
              return "BIGINT";
          default:
        // TODO(aaron): Support BINARY, VARBINARY, LONGVARBINARY, DISTINCT,
        // BLOB, ARRAY, STRUCT, REF, JAVA_OBJECT.
        return null;
      }
  }

Check source code here.

Mapping from --map-column-hive is overridden on this mapping.



回答2:

Sqoop is preconfigured to map most SQL types to appropriate Java or Hive representatives. However the default mapping might not be suitable for everyone and might be overridden.

sqoop import ... --map-column-java id=String,value=Integer

See Sqoop user guide: https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_controlling_type_mapping

Also some other config properties may affect type mapping, like sqoop.bigdecimal.format.string, etc

Probably you need to study source codes for complete list of preconfigured mappings. But specifying them as --map-column-java or --map-column-hive gives you more control. You can generate mappings from target tables metadata.