I have this environment:
- Haddop environment (1 master, 4 slaves) with several applications: ambari, hue, hive, sqoop, hdfs ... Server in production (separate from hadoop) with mysql database.
My goal is:
- Optimize the queries made on this mysql server that are slow to execute today.
What did I do:
- I imported the mysql data to HDFS using Sqoop.
My doubts:
- I can not make selects direct in HDFS using Hive?
- Do I have to load the data into Hive and make the queries?
- If new data is entered into the mysql database, what is the best way to get this data and insert it into HDFS and then insert it into Hive again? (Maybe in real time)
Thank you in advance