VM cloudera - user cloudera and permissions?

2019-07-04 18:54发布

问题:

I downloaded and installed the VM Cloudera 4.4 to play with Hadoop. I have already a cluster on a platform for my job, so I know a little how works hadoop. So I think my problem comes from my misunderstanding of linux and his users and group.


With Hive :

I try to create a hive table with the shell, and it works. I have a table in /user/hive/warehouse/test witch belongs to user cloudera of group cloudera.

I have some data files (.txt) in hdfs : /user/cloudera ( user:cloudera and group: hive) that I load in my hive table with :

LOAD DATA INPATH '/user/cloudera/*.txt' INTO TABLE test;

This is what I obtained :

hive> LOAD DATA INPATH '/user/cloudera/jeuDeTest/*.txt' INTO TABLE test;
Loading data to table default.test
chgrp: changing ownership of '/user/hive/warehouse/test/_log24310.txt': User does not belong to hive
chgrp: changing ownership of '/user/hive/warehouse/test/_log24311.txt': User does not belong to hive
Table default.test stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 10161843, raw_data_size: 0]
OK
Time taken: 2.472 seconds

I never had this kind of error message but the files are moved. If I try a SELECT *, there is no result.


With HBase :

I have also some difficulties with HBase. I can create a table but when I use importTSV :

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv 
-Dimporttsv.columns=HBASE_ROW_KEY,cf:nl,ch:nt,cf:ti,cf:ip,cf:cr,cf:am,cf:op,cf:mr,cf:ct 
'-Dimporttsv.separator=|' testhbase -Dimporttsv.skip.bad.lines=false  
/user/cloudera/jeuDeTest/*.txt

I have this error :

ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) 
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: 
hdfs://localhost.localdomain:8020/user/cloudera/jeuDeTest/_logGeneral_C_24310_SO.txt
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist:     
hdfs://localhost.localdomain:8020/user/cloudera/jeuDeTest/_logGeneral_C_24310_SO.txt

I think this problems are due to permissions but I don't know how to do have the right to execute request, what is the better way to do that. (On the platform I have at work, I am root, and I don't have all this difficulties, but I don't understand how it works)

Thank you for reading me.

Angelik


I try to add my cloudera user to the group hive. I don't have the error during the load but I have always no result on a select.

hive> LOAD DATA INPATH '/user/cloudera/jeuDeTest/*.txt' INTO TABLE test;                     
Loading data to table default.test
Table default.test stats: [num_partitions: 0, num_files: 10, num_rows: 0, total_size: 10161843,   raw_data_size: 0]
OK
Time taken: 0.486 seconds
hive> select * from test limit 20;
OK
Time taken: 0.303 seconds

回答1:

I had same issue with permissions -> chgrp: changing ownership of '/user/hive/warehouse/test/_log24310.txt': User does not belong to hive.

  1. Added the existing user named cloudera to existing group named hive with command: usermod -a -G hive cloudera
  2. Restarted the system
  3. Used Load Command and after that did a select * from table_name -> No data was getting displayed.
  4. Executed select count(*) from table_name and a MapReduce job got started.
  5. Executed select * from table and now results was returned correctly.
  6. Opened a impala shell using impala-shell command.
  7. Executed a select * from table_name and no results was getting returned.
  8. Executed command invalidate metadata in the impala-shell
  9. Executed command refresh table_name
  10. Executed command show tables
  11. Executed command select * from table_name and now results are getting displayed both in the impala-shell and hive shell.