We are using Hortonworks HDP 2.1 with Ambari 1.6.1
After a crash in our underlying hardware we restarted our cluster some days ago. We got everything back up again, however, Ambari shows that two services are still down, the YARN Resource Manager and the MapReduce History Server. Both of those services are running, verified both by checking running processes on the server as well as checking the provided functionality. Nagios healthchecks are also ok. Still, Ambari shows the services as being stopped. Trying to start them does not work (Address already in use, which is to be expected, because it is already running). If the process is killed before starting, then it will be started, but will still be displayed as an failed operation and Ambari will continue to display the service as being stopped.
Anyone else has seen a similar problem before? I could not find any information about similar cases anywhere.
I have experienced similar issues in the past and it was due to permissions on a PID file. Take a look at the service descriptor files for YARN to see what files it checks to see if it's running. Typically it reads a pid and checks if the process listed in the pid file is running. I would discover the location of the pid file it checks then stop the service, delete the pid files, then use ambari to restart the services. This should recreate the pid files with the correct user/group and permissions and ultimately fix the issue you are seeing.