I downloaded and installed the VM Cloudera 4.4 to play with Hadoop. I have already a cluster on a platform for my job, so I know a little how works hadoop. So I think my problem comes from my misunderstanding of linux and his users and group.
With Hive :
I try to create a hive table with the shell, and it works. I have a table in /user/hive/warehouse/test witch belongs to user cloudera of group cloudera.
I have some data files (.txt) in hdfs : /user/cloudera ( user:cloudera and group: hive) that I load in my hive table with :
LOAD DATA INPATH '/user/cloudera/*.txt' INTO TABLE test;
This is what I obtained :
hive> LOAD DATA INPATH '/user/cloudera/jeuDeTest/*.txt' INTO TABLE test;
Loading data to table default.test
chgrp: changing ownership of '/user/hive/warehouse/test/_log24310.txt': User does not belong to hive
chgrp: changing ownership of '/user/hive/warehouse/test/_log24311.txt': User does not belong to hive
Table default.test stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 10161843, raw_data_size: 0]
Time taken: 2.472 seconds
I never had this kind of error message but the files are moved. If I try a SELECT *
, there is no result.
With HBase :
I have also some difficulties with HBase. I can create a table but when I use importTSV :
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
'-Dimporttsv.separator=|' testhbase -Dimporttsv.skip.bad.lines=false
I have this error :
ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE)
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist:
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist:
I think this problems are due to permissions but I don't know how to do have the right to execute request, what is the better way to do that. (On the platform I have at work, I am root, and I don't have all this difficulties, but I don't understand how it works)
Thank you for reading me.
I try to add my cloudera user to the group hive. I don't have the error during the load but I have always no result on a select.
hive> LOAD DATA INPATH '/user/cloudera/jeuDeTest/*.txt' INTO TABLE test;
Loading data to table default.test
Table default.test stats: [num_partitions: 0, num_files: 10, num_rows: 0, total_size: 10161843, raw_data_size: 0]
Time taken: 0.486 seconds
hive> select * from test limit 20;
Time taken: 0.303 seconds