Allright i implemented cluster on virtual machines so i wanted to share all i did. in my cluster i created one manager node(only for cloudera manager), one namenode, two datanode. This made adding new node to cluster easier and without problem. i also prepared simple document for instructions. It maybe little summerized but working ok. Most of the codes are taken from various sites so i tried to keep them simple as much as i understand. I added this answer here because my implementation is also including adding new host to cluster.
Note: i am very new to linux environment, i tried my best to do things, i am expecting any one who can correct my comments on usage or explainings.
==================================================================================
These instructions are implemented on cenTOS 6.2 x64 (non live desktop version). If you use server version then you may need to configure network configuration by yourself.
Use same version on all machine as much as possible. Some says IP values of machines are important but i implemented with different IP ranges like one machine is using 192.168.12.13 and other is 192.168.13.144. it is not creating problem.
I also used Oracle VirtualBox for virtual machine environment on windows 7 enterprise.
Suggestion : when you create one common cenTOS installation then you should create a clone if any wrong configuration happens. Keep a backup clone always.
Download these files manually first:
cloudera manager (you can download community edition). we need this for master node but that does not mean that master node is part of cluster. I
used manager on machine which has no namenode or job tracker, just mamanger applicaiton.
Oracle JDK. you can download proper one from oracle web site. Just go there and download from browser or copy the link and use wget to download it. It is your choise.
Be sure to uninstall "open jdk" :
yum remove java-1.6.0-openjdk
install "oracle jdk" manualy
Note that wget line can be changed. you can download file from browser.
wget http://download.oracle.com/otn-pub/java/jdk/6u27-b07/jdk-6u27-linux-x64-rpm.bin
chmod u+x jdk-6u27-linux-x64-rpm.bin
./jdk-6u27-linux-x64-rpm.bin
Make our system and browsers use our new java
/usr/sbin/alternatives --install /usr/bin/java java /usr/java/default/bin/java 20000
/usr/sbin/alternatives --install /usr/lib/mozilla/plugins/libjavaplugin.so libjavaplugin.so /usr/java/default/jre/lib/i386/libnpjp2.so 20000
Add user as sudoers
nano /etc/sudoers
find the line "root ALL=(ALL) ALL" and add this line below
username ALL=(ALL) ALL
//This lines means that the user root can execute from ALL terminals,
//acting as ALL (any) users, and run ALL (any) command.
Install "ssh server"
sudo yum install openssh-server
check the ssh server status to be sure it is running
/sbin/service sshd status
start sshd service if it is not started
/sbin/service sshd start
or you can simply test ssh with
ssh localhost
after succesfull test you can exit
exit
These instructions are also defined in cloudera web site.
If you can check the /var/log/cloudera-scm-agent/cloudera-scm-agent-log or .out files and see that there are persistence or hibernate related
exception/errors that means problem is about postgresql database. probably database is not set yet. All we need to do is to set it up.
Not : postgresql only needed for manager(master) node. no need for slaves.
Be sure postgresql instance is installed by checking service status
/etc/init.d/postgresql status
Not : instruction below needs repo configuration!!! If you do not know how then skip to script file usage.
Install the embedded PostgreSQL database package on the Cloudera Manager Server host:
sudo yum install cloudera-manager-server-db
Prepare the embedded PostgreSQL database for use with the Cloudera Manager Server by running this command
sudo /sbin/service cloudera-scm-server-db initdb
Start the embedded PostgreSQL database by running this command:
sudo /sbin/service cloudera-scm-server-db start
Script file usage : Instruction below is manual setting of postgresql with script file
/usr/share/cmf/schema/scm_prepare_database.sh database-type [options] database-name username password
Required Parameter and Description
database-type To connect to a MySQL database, specify mysql as the database type, or specify postgresqlto connect to an external PostgreSQL database.
database-name The name of the Cloudera Manager Server database you want to create.
username The username for the Cloudera Manager Server database you want to create.
password The password for the Cloudera Manager Server database you want to create. If you don't specify the password on the command line, the script will prompt you to enter it.
You can check this page for details : https://ccp.cloudera.com/display/ENT/Installation+Path+B+-+Installation+Using+Your+Own+Method#InstallationPathB-InstallationUsingYourOwnMethod-Step5%3AConfigureaDatabasefortheClouderaManagerServer
start postgresql if it is not started (you can check the status and to be sure restart it)
/etc/init.d/postgresql start
If there is rooting/ firewall restriction on linux then heartbeath of the agent will not reach master node(manager) so we need to eliminate security
concerns. In this case there are Selinux and iptables that can create problem. Cloudera says disable iptables totally but if you are experienced
about iptables configuration then you can add rules like this.
open iptables and set rule for port access of 7180
nano /etc/sysconfig/iptables
adding this line :
-A RH-Firewall-1-INPUT -m state –state NEW -m tcp -p tcp –dport 7180 -j ACCEPT
or simply (cloudera way) disable iptables totaly. be sure it is same on all nodes
sudo /etc/init.d/iptables stop
check iptables status with status parameter
/etc/init.d/iptables status
Not : Every time machine restarts, iptables will be activated again so you may need a way to stop it automatically.
Ay problem happened because of iptables and selinuxun will be in log file "cloudera-scm-agent.log". You may see some "deprecated" warnings about
phyton code, just ignore them. Error/exception are generally "no route to host " or something like that.
disable selinux. but you may need to do this before many operation above. Especially when you try to install cloudera manager. linux will give you warning about selinux.
sudo nano /etc/selinux/config
(selinux=disabled)
Set unique host name for each machine. so in each mahine edit this file and give name to that machine. we will use this name in hosts file.
sudo nano /etc/sysconfig/network
remodify host file with all ip values and hostnames of nodes. Do this in all nodes. You can simply copy to other nodes also. all hosts files will be same
sudo nano /etc/hosts
example :
127.0.0.1 localhost
192.168.1.2 masternode
192.168.1.3 namenode
192.168.1.4 datanode1
192.168.1.5 datanode2
check the cloudera manager status and if you need you can restart it
sudo /sbin/service cloudera-scm-server start
be sure your internet connection is good enough for all nodes. because manager will connect them and starts series of download operation on each of them. if manager comes across any problem it will rollback everything so this will cost you to restart each everything. Trust me this part is taking too much time!
if you using virtual machines as nodes(which is i did.) you may choose bridged network mode. so you can give internet connectivity to all nodes but this has one downside. If you restart your physical machine you may lost your ip values and retake new ones automatically. Which can couse you to remodify hosts file on each node. But if you use NAT or something other like internal network you can give static ip values to your nodes so there will not be reconfiguration need. but then you should provide internet access gateway ip for all machine. because not just manager, also agents need internet access to download files. Ofcourse when you finish seting up your cluster then you can eliminate the need of agent(slaves) node's internet access.
You should try ifconfig when you start virtual machine to see if it is getting ip value from network. If not then your virtual machine configuration on your VM application must be changed. if you are working on physical machine that has cable and wireless connectivity then you will have more than one ethernet adaptor choise. bu sure to choose right one. wrong one will not give you ip address.
Be sure to use oracle JDK.
Check cloudera scm status time to time.
sudo /sbin/service cloudera-scm-server status
check 7180 and other cloudera manager realted ports are listened. you can use "nmap" or "netstat --listen"
If you are unable to install cloudera manager to master node(probably selinux, postgresql or download problem. by the way be sure download is uncuttable) then you may need to clean up and restart.
this line will clean cloudera realted files and allow you to restart again.
sudo rm -Rf /usr/share/{cmf,hue} /var/lib/cloudera* /var/cache/yum/cloudera*
you can restart cloudera-scm-agent on slave nodes if you change anything and to besure process are working correctly.But you shold clean log files to see if new configuration is working properly. Log files are important to see what is going wrong or right.
cd /var/log/cloudera-scm-agent
sudo rm *
Next steps are adding host from cludera manager web interface :
In manager machine i used "localhost:7180" to connect to mamanger gui. in the hosts part you will se adding new host to cluster. just add the name of the node in testbox adn press the "Find Hosts" button. The name of the hosts are already defined in /etc/hosts file if you remember. So you can either use ip or hostname in the textbox, if they are set right then mamanger will find suitable one and lists them in list above. If they are not managed yet (means nothing installed on them yet), "currently managed" column will show "no". otherwise it will show "yes".
After that you can continue to install cloudera agent and hadoop files on choosen hosts. But if you already installed them(if they are managed) then you can begin to add services on them. just go to "Services" page and continue your process. If you set ups hosts correctly and see they are managed then adding service is very easy and non problematic.(at least for me).
please send any comment about my answer. it is kind a long. maybe nonneccessaryly. but i tried to add every detail.
To solve this error, I did three things:
1) vim /etc/cloudera-scm-agent/config.ini
Originally it was
# Hostname of Cloudera SCM Server
server_host=localhost
Changed hostname to:
server_host=manager
Also make sure the 'manager' is added in the /etc/hosts file
2) Installed java in the /usr/local/java/jdk1.7xxx directory
In ~/.bash_profile
Included following
export JAVA_HOME=/usr/local/java/jdk1.7xxx
Soft Link can also be used for this purpose:
export PATH=$PATH:$JAVA_HOME:bin
Cloudera probably takes the java path as '/usr/java'. So I created a symbolic link in the /usr directory.
3) When it still did not work, I installed MySQL Connector using the following:
yum install mysql-connector-java
Restart server and restart agents. It worked for me then.