gke cant disable Transparent Huge Pages… permissio

2020-04-07 05:16发布

问题:

I am trying to run a redis image in gke. It works except I get the dreaded "Transparent Huge Pages" warning:

WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

Redis is currently too slow to be useful... So I tied turning off THP:

sheena@gke-projectwaxd-cluster-default-pool-23593a74-wxrv ~ $ cat  /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
sheena@gke-projectwaxd-cluster-default-pool-23593a74-wxrv ~ $ echo never >  /sys/kernel/mm/transparent_hugepage/enabled 
-bash: /sys/kernel/mm/transparent_hugepage/enabled: Permission denied
sheena@gke-projectwaxd-cluster-default-pool-23593a74-wxrv ~ $ sudo echo never >  /sys/kernel/mm/transparent_hugepage/enabled 
-bash: /sys/kernel/mm/transparent_hugepage/enabled: Permission denied

These permission errors are disconcerting. Redis wants THP off so it can work properly.

I did a little digging and found that google uses a special os image that makes /sys/ a read-only path. There's an alternative image that's based on Debian 7. It got me all excited but in the end I have exactly the same problem.

So how do I stop redis from being effected by THP on Google container engine?

It's not like I'm doing something unique here. Running databases in containers is pretty normal. And it's pretty normal for a database to malfunction when THP is enabled. So... what am I missing here?

回答1:

Your command is slightly incorrect: echo runs as root but the redirection itself (>) runs as user so it can't write /sys/.

The following command works fine both on container-vm (debian based) and gci (chromeos based):

sudo sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'

Persisting this setting on container-vm

Add this kernel command line parameter into /etc/default/grub (don't forget to run sudo update-grub and sudo reboot afterwards):

GRUB_CMDLINE_LINUX="... transparent_hugepage=never"

Persisting this setting on gci

First, using the cloud console copy the instance template that is in use by the node pool.

Second, under metadata change the value for userdata:

#cloud-config

write_files:
  - path: /etc/systemd/system/hugepage.service
    permissions: 0644
    owner: root
    content: |
      [Unit]
      Description=Disable THP

      [Service]
      Type=oneshot
      ExecStart=/bin/sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"

      [Install]
      WantedBy=kubernetes.target
...
runcmd:
 - ...
 - systemctl enable hugepage.service
 - systemctl start kubernetes.target

Third, change the instance template to the newly created one:

gcloud compute instance-groups managed set-instance-template \
  gke-YOUCLUSTER-YOURPOOL-grp \
  --template=YOURNEWTEMPLATENAME \
  --zone=...

Forth, recreate the instace(s):

gcloud compute instance-groups managed recreate-instances \
   gke-YOUCLUSTER-YOURPOOL-grp \
   --zone=... \
   --instances=...

The instances will loose all data and come up with THP disabled. All new instances will have THP disabled as well (in this node pool).