Error when querying avro-backed hive table: java.l

2019-08-07 03:11发布

问题:

I am trying to create a hive table on azure HDInsight from an avro file exported from raw google analytics data in BigQuery.

It seems to work. I can created the table, and there are no errors when I run DESCRIBE. But when I try to select results, even if I select only two non-nested columns, I get a an error: "java.lang.IllegalArgumentException".

Here's how I created the table:

DROP TABLE IF EXISTS ga_sessions_20150106;
CREATE EXTERNAL TABLE IF NOT EXISTS ga_sessions_20150106
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/upload/ga_sessions'
TBLPROPERTIES ('avro.schema.url'='/upload/ga_sessions.avsc');
describe ga_sessions_20150106;

Here's the avro schema:

{"type":"record","name":"root","fields":[{"name":"visitorId","type":["long","null"]},{"name":"visitNumber","type":["long","null"]},{"name":"visitId","type":["long","null"]},{"name":"visitStartTime","type":["long","null"]},{"name":"date","type":["string","null"]},{"name":"totals","type":[{"type":"record","name":"totals","fields":[{"name":"visits","type":["long","null"]},{"name":"hits","type":["long","null"]},{"name":"pageviews","type":["long","null"]},{"name":"timeOnSite","type":["long","null"]},{"name":"bounces","type":["long","null"]},{"name":"transactions","type":["long","null"]},{"name":"transactionRevenue","type":["long","null"]},{"name":"newVisits","type":["long","null"]},{"name":"screenviews","type":["long","null"]},{"name":"uniqueScreenviews","type":["long","null"]},{"name":"timeOnScreen","type":["long","null"]},{"name":"totalTransactionRevenue","type":["long","null"]}]},"null"]},{"name":"trafficSource","type":[{"type":"record","name":"trafficSource","fields":[{"name":"referralPath","type":["string","null"]},{"name":"campaign","type":["string","null"]},{"name":"source","type":["string","null"]},{"name":"medium","type":["string","null"]},{"name":"keyword","type":["string","null"]},{"name":"adContent","type":["string","null"]},{"name":"adwordsClickInfo","type":[{"type":"record","name":"adwordsClickInfo","fields":[{"name":"campaignId","type":["long","null"]},{"name":"adGroupId","type":["long","null"]},{"name":"creativeId","type":["long","null"]},{"name":"criteriaId","type":["long","null"]},{"name":"page","type":["long","null"]},{"name":"slot","type":["string","null"]},{"name":"criteriaParameters","type":["string","null"]},{"name":"gclId","type":["string","null"]},{"name":"customerId","type":["long","null"]},{"name":"adNetworkType","type":["string","null"]},{"name":"targetingCriteria","type":[{"type":"record","name":"targetingCriteria","fields":[{"name":"boomUserlistId","type":["long","null"]}]},"null"]}]},"null"]}]},"null"]},{"name":"device","type":[{"type":"record","name":"device","fields":[{"name":"browser","type":["string","null"]},{"name":"browserVersion","type":["string","null"]},{"name":"operatingSystem","type":["string","null"]},{"name":"operatingSystemVersion","type":["string","null"]},{"name":"isMobile","type":["boolean","null"]},{"name":"mobileDeviceBranding","type":["string","null"]},{"name":"flashVersion","type":["string","null"]},{"name":"javaEnabled","type":["boolean","null"]},{"name":"language","type":["string","null"]},{"name":"screenColors","type":["string","null"]},{"name":"screenResolution","type":["string","null"]},{"name":"deviceCategory","type":["string","null"]}]},"null"]},{"name":"geoNetwork","type":[{"type":"record","name":"geoNetwork","fields":[{"name":"continent","type":["string","null"]},{"name":"subContinent","type":["string","null"]},{"name":"country","type":["string","null"]},{"name":"region","type":["string","null"]},{"name":"metro","type":["string","null"]}]},"null"]},{"name":"customDimensions","type":{"type":"array","items":{"type":"record","name":"customDimensions","fields":[{"name":"index","type":["long","null"]},{"name":"value","type":["string","null"]}]}}},{"name":"hits","type":{"type":"array","items":{"type":"record","name":"hits","fields":[{"name":"hitNumber","type":["long","null"]},{"name":"time","type":["long","null"]},{"name":"hour","type":["long","null"]},{"name":"minute","type":["long","null"]},{"name":"isSecure","type":["boolean","null"]},{"name":"isInteraction","type":["boolean","null"]},{"name":"isEntrance","type":["boolean","null"]},{"name":"isExit","type":["boolean","null"]},{"name":"referer","type":["string","null"]},{"name":"page","type":[{"type":"record","name":"page","fields":[{"name":"pagePath","type":["string","null"]},{"name":"hostname","type":["string","null"]},{"name":"pageTitle","type":["string","null"]},{"name":"searchKeyword","type":["string","null"]},{"name":"searchCategory","type":["string","null"]}]},"null"]},{"name":"transaction","type":[{"type":"record","name":"transaction","fields":[{"name":"transactionId","type":["string","null"]},{"name":"transactionRevenue","type":["long","null"]},{"name":"transactionTax","type":["long","null"]},{"name":"transactionShipping","type":["long","null"]},{"name":"affiliation","type":["string","null"]},{"name":"currencyCode","type":["string","null"]},{"name":"localTransactionRevenue","type":["long","null"]},{"name":"localTransactionTax","type":["long","null"]},{"name":"localTransactionShipping","type":["long","null"]},{"name":"transactionCoupon","type":["string","null"]}]},"null"]},{"name":"item","type":[{"type":"record","name":"item","fields":[{"name":"transactionId","type":["string","null"]},{"name":"productName","type":["string","null"]},{"name":"productCategory","type":["string","null"]},{"name":"productSku","type":["string","null"]},{"name":"itemQuantity","type":["long","null"]},{"name":"itemRevenue","type":["long","null"]},{"name":"currencyCode","type":["string","null"]},{"name":"localItemRevenue","type":["long","null"]}]},"null"]},{"name":"contentInfo","type":[{"type":"record","name":"contentInfo","fields":[{"name":"contentDescription","type":["string","null"]}]},"null"]},{"name":"appInfo","type":[{"type":"record","name":"appInfo","fields":[{"name":"name","type":["string","null"]},{"name":"version","type":["string","null"]},{"name":"id","type":["string","null"]},{"name":"installerId","type":["string","null"]},{"name":"appInstallerId","type":["string","null"]},{"name":"appName","type":["string","null"]},{"name":"appVersion","type":["string","null"]},{"name":"appId","type":["string","null"]},{"name":"screenName","type":["string","null"]},{"name":"landingScreenName","type":["string","null"]},{"name":"exitScreenName","type":["string","null"]},{"name":"screenDepth","type":["string","null"]}]},"null"]},{"name":"exceptionInfo","type":[{"type":"record","name":"exceptionInfo","fields":[{"name":"description","type":["string","null"]},{"name":"isFatal","type":["boolean","null"]}]},"null"]},{"name":"eventInfo","type":[{"type":"record","name":"eventInfo","fields":[{"name":"eventCategory","type":["string","null"]},{"name":"eventAction","type":["string","null"]},{"name":"eventLabel","type":["string","null"]},{"name":"eventValue","type":["long","null"]}]},"null"]},{"name":"product","type":{"type":"array","items":{"type":"record","name":"product","fields":[{"name":"productSKU","type":["string","null"]},{"name":"v2ProductName","type":["string","null"]},{"name":"v2ProductCategory","type":["string","null"]},{"name":"productVariant","type":["string","null"]},{"name":"productBrand","type":["string","null"]},{"name":"productRevenue","type":["long","null"]},{"name":"localProductRevenue","type":["long","null"]},{"name":"productPrice","type":["long","null"]},{"name":"localProductPrice","type":["long","null"]},{"name":"productQuantity","type":["long","null"]},{"name":"productRefundAmount","type":["long","null"]},{"name":"localProductRefundAmount","type":["long","null"]},{"name":"isImpression","type":["boolean","null"]},{"name":"customDimensions","type":{"type":"array","items":"customDimensions"}},{"name":"customMetrics","type":{"type":"array","items":{"type":"record","name":"customMetrics","fields":[{"name":"index","type":["long","null"]},{"name":"value","type":["long","null"]}]}}}]}}},{"name":"promotion","type":{"type":"array","items":{"type":"record","name":"promotion","fields":[{"name":"promoId","type":["string","null"]},{"name":"promoName","type":["string","null"]},{"name":"promoCreative","type":["string","null"]},{"name":"promoPosition","type":["string","null"]}]}}},{"name":"promotionActionInfo","type":[{"type":"record","name":"promotionActionInfo","fields":[{"name":"promoIsView","type":["boolean","null"]},{"name":"promoIsClick","type":["boolean","null"]}]},"null"]},{"name":"refund","type":[{"type":"record","name":"refund","fields":[{"name":"refundAmount","type":["long","null"]},{"name":"localRefundAmount","type":["long","null"]}]},"null"]},{"name":"eCommerceAction","type":[{"type":"record","name":"eCommerceAction","fields":[{"name":"action_type","type":["string","null"]},{"name":"step","type":["long","null"]},{"name":"option","type":["string","null"]}]},"null"]},{"name":"experiment","type":{"type":"array","items":{"type":"record","name":"experiment","fields":[{"name":"experimentId","type":["string","null"]},{"name":"combination","type":["string","null"]}]}}},{"name":"customVariables","type":{"type":"array","items":{"type":"record","name":"customVariables","fields":[{"name":"index","type":["long","null"]},{"name":"customVarName","type":["string","null"]},{"name":"customVarValue","type":["string","null"]}]}}},{"name":"customDimensions","type":{"type":"array","items":"customDimensions"}},{"name":"customMetrics","type":{"type":"array","items":"customMetrics"}},{"name":"type","type":["string","null"]},{"name":"social","type":[{"type":"record","name":"social","fields":[{"name":"socialInteractionNetwork","type":["string","null"]},{"name":"socialInteractionAction","type":["string","null"]}]},"null"]}]}}},{"name":"fullVisitorId","type":["string","null"]},{"name":"userId","type":["string","null"]}]}

Here's what comes back with DESCRIBE:

visitorid               bigint                  from deserializer   
visitnumber             bigint                  from deserializer   
visitid                 bigint                  from deserializer   
visitstarttime          bigint                  from deserializer   
date                    string                  from deserializer   
totals                  struct<visits:bigint,hits:bigint,pageviews:bigint,timeonsite:bigint,bounces:bigint,transactions:bigint,transactionrevenue:bigint,newvisits:bigint,screenviews:bigint,uniquescreenviews:bigint,timeonscreen:bigint,totaltransactionrevenue:bigint>   from deserializer   
trafficsource           struct<referralpath:string,campaign:string,source:string,medium:string,keyword:string,adcontent:string,adwordsclickinfo:struct<campaignid:bigint,adgroupid:bigint,creativeid:bigint,criteriaid:bigint,page:bigint,slot:string,criteriaparameters:string,gclid:string,customerid:bigint,adnetworktype:string,targetingcriteria:struct<boomuserlistid:bigint>>>   from deserializer   
device                  struct<browser:string,browserversion:string,operatingsystem:string,operatingsystemversion:string,ismobile:boolean,mobiledevicebranding:string,flashversion:string,javaenabled:boolean,language:string,screencolors:string,screenresolution:string,devicecategory:string>    from deserializer   
geonetwork              struct<continent:string,subcontinent:string,country:string,region:string,metro:string>  from deserializer   
customdimensions        array<struct<index:bigint,value:string>>    from deserializer   
hits                    array<struct<hitnumber:bigint,time:bigint,hour:bigint,minute:bigint,issecure:boolean,isinteraction:boolean,isentrance:boolean,isexit:boolean,referer:string,page:struct<pagepath:string,hostname:string,pagetitle:string,searchkeyword:string,searchcategory:string>,transaction:struct<transactionid:string,transactionrevenue:bigint,transactiontax:bigint,transactionshipping:bigint,affiliation:string,currencycode:string,localtransactionrevenue:bigint,localtransactiontax:bigint,localtransactionshipping:bigint,transactioncoupon:string>,item:struct<transactionid:string,productname:string,productcategory:string,productsku:string,itemquantity:bigint,itemrevenue:bigint,currencycode:string,localitemrevenue:bigint>,contentinfo:struct<contentdescription:string>,appinfo:struct<name:string,version:string,id:string,installerid:string,appinstallerid:string,appname:string,appversion:string,appid:string,screenname:string,landingscreenname:string,exitscreenname:string,screendepth:string>,exceptioninfo:struct<description:string,isfatal:boolean>,eventinfo:struct<eventcategory:string,eventaction:string,eventlabel:string,eventvalue:bigint>,product:array<struct<productsku:string,v2productname:string,v2productcategory:string,productvariant:string,productbrand:string,productrevenue:bigint,localproductrevenue:bigint,productprice:bigint,localproductprice:bigint,productquantity:bigint,productrefundamount:bigint,localproductrefundamount:bigint,isimpression:boolean,customdimensions:array<struct<index:bigint,value:string>>,custommetrics:array<struct<index:bigint,value:bigint>>>>,promotion:array<struct<promoid:string,promoname:string,promocreative:string,promoposition:string>>,promotionactioninfo:struct<promoisview:boolean,promoisclick:boolean>,refund:struct<refundamount:bigint,localrefundamount:bigint>,ecommerceaction:struct<action_type:string,step:bigint,option:string>,experiment:array<struct<experimentid:string,combination:string>>,customvariables:array<struct<index:bigint,customvarname:string,customvarvalue:string>>,customdimensions:array<struct<index:bigint,value:string>>,custommetrics:array<struct<index:bigint,value:bigint>>,type:string,social:struct<socialinteractionnetwork:string,socialinteractionaction:string>>>   from deserializer   
fullvisitorid           string                  from deserializer   
userid                  string                  from deserializer   

Error (i can post more of the log if desired. It doesn't have more details after "15 more", but you can see what's happening prior.):

Caused by: java.lang.IllegalArgumentException
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
    at org.apache.avro.io.BinaryDecoder.readBytes(BinaryDecoder.java:288)
    at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:112)
    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
    at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:81)
    at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:498)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:588)
    ... 15 more

回答1:

OK -- this issue is resolved.

The issue is that when I downloaded the file from google cloud storage using python client, I wrote it to file in text mode (the default) when I needed to use binary mode.

I re-downloaded it, re-uploaded it, and it worked.