Error exporting data from Google Cloud Bigtable

2019-07-11 01:09发布

问题:

While going through the Google docs, I'm getting the below stack trace on the final export command (executed from the master instance with appropriate env variables set).

${HADOOP_HOME}/bin/hadoop jar ${HADOOP_BIGTABLE_JAR} export-table -libjars ${HADOOP_BIGTABLE_JAR} <table-name> <gs://bucket>

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-install/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-install/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2016-02-08 23:39:39,068 INFO  [main] mapreduce.Export: versions=1, starttime=0, endtime=9223372036854775807, keepDeletedCells=false
2016-02-08 23:39:39,213 INFO  [main] gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.4-hadoop2
java.lang.IllegalAccessError: tried to access field sun.security.ssl.Handshaker.localSupportedSignAlgs from class sun.security.ssl.ClientHandshaker
    at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:278)
    at sun.security.ssl.Handshaker.processLoop(Handshaker.java:913)
    at sun.security.ssl.Handshaker.process_record(Handshaker.java:849)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1035)
    at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1344)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)
    at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153)
    at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93)
    at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
    at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getBucket(GoogleCloudStorageImpl.java:1599)
    at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1554)
    at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage.getItemInfo(CacheSupplementedGoogleCloudStorage.java:547)
    at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1042)
    at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.exists(GoogleCloudStorageFileSystem.java:383)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configureBuckets(GoogleHadoopFileSystemBase.java:1650)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.configureBuckets(GoogleHadoopFileSystem.java:71)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1598)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:783)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:746)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:352)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.hadoop.hbase.util.DynamicClassLoader.<init>(DynamicClassLoader.java:104)
    at org.apache.hadoop.hbase.protobuf.ProtobufUtil.<clinit>(ProtobufUtil.java:241)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:509)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:207)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:168)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:291)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:92)
    at org.apache.hadoop.hbase.mapreduce.IdentityTableMapper.initJob(IdentityTableMapper.java:51)
    at org.apache.hadoop.hbase.mapreduce.Export.createSubmittableJob(Export.java:75)
    at org.apache.hadoop.hbase.mapreduce.Export.main(Export.java:187)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
    at com.google.cloud.bigtable.mapreduce.Driver.main(Driver.java:35)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Here's my ENV var set up in case it's helpful:

export HBASE_HOME=/home/hadoop/hbase-install
export HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
export HADOOP_HOME=/home/hadoop/hadoop-install

export HADOOP_CLIENT_OPTS="-Xbootclasspath/p:${HBASE_HOME}/lib/bigtable/alpn-boot-7.1.3.v20150130.jar"
export HADOOP_BIGTABLE_JAR=${HBASE_HOME}/lib/bigtable/bigtable-hbase-mapreduce-0.2.2-shaded.jar
export HADOOP_HBASE_JAR=${HBASE_HOME}/lib/hbase-server-1.1.2.jar

Also, when I try to run hbase shell and then list tables it just hangs and doesn't fetch me the list of tables. This is what happens:

~$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-install/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-install/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2016-02-09 00:02:01,334 INFO  [main] grpc.BigtableSession: Opening connection for projectId mystical-height-89421, zoneId us-central1-b, clusterId twitter-data, on data host bigtable.googleapis.com, table admin host bigtabletableadmin.googleapis.com.
2016-02-09 00:02:01,358 INFO  [BigtableSession-startup-0] grpc.BigtableSession: gRPC is using the JDK provider (alpn-boot jar)
2016-02-09 00:02:01,648 INFO  [bigtable-connection-shared-executor-pool1-t2] io.RefreshingOAuth2CredentialsInterceptor: Refreshing the OAuth token
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2, rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26 20:11:27 PDT 2015

hbase(main):001:0> list
TABLE

I've tried:

  • Double checking ALPN and ENV variables are appropriately set
  • Double checking hbase-site.xml and hbase-env.sh to make sure nothing looks wrong.

I also even tried connecting to my cluster (like I was previously able to following these directions) from ANOTHER gcloud instance, but it seems like I can't seem to get that to work now either...(it also hangs)

user@gcloud-instance:hbase-1.1.2$ bin/hbase shell
2016-02-09 00:07:03,506 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-02-09 00:07:03,913 INFO  [main] grpc.BigtableSession: Opening connection for projectId <project>, zoneId us-central1-b, clusterId <cluster>, on data host bigtable.googleapis.com, table admin host bigtabletableadmin.googleapis.com.
2016-02-09 00:07:04,039 INFO  [BigtableSession-startup-0] grpc.BigtableSession: gRPC is using the JDK provider (alpn-boot jar)
2016-02-09 00:07:05,138 INFO  [Credentials-Refresh-0] io.RefreshingOAuth2CredentialsInterceptor: Refreshing the OAuth token
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2, rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26 20:11:27 PDT 2015

hbase(main):001:0> list
TABLE
Feb 09, 2016 12:07:08 AM com.google.bigtable.repackaged.io.grpc.internal.TransportSet$1 run
INFO: Created transport com.google.bigtable.repackaged.io.grpc.netty.NettyClientTransport@7b480442(bigtabletableadmin.googleapis.com/64.233.183.219:443) for bigtabletableadmin.googleapis.com/64.233.183.219:443

Any ideas with what I'm doing wrong? Looks like an access issue - how do I fix it?

Thanks!

回答1:

  1. You can spin up a Dataproc cluster w/ Bigtable enabled following these instructions.

  2. ssh to the master by ./cluster.sh ssh

  3. hbase shell to verify that all is in order.

  4. hadoop jar ${HADOOP_BIGTABLE_JAR} export-table -libjars ${HADOOP_BIGTABLE_JAR} <table-name> gs://<bucket>/some-folder

  5. gsutil ls gs://<bucket>/some-folder/** and see if _SUCCESS exists. If so, the remaining files are your data.

  6. exit from your cluster master

  7. ./cluster.sh delete to get rid of the cluster, if you no longer require it.

You ran into a problem with the weekly java runtime update, that has been corrected.