有没有检测一个陈旧的NFS挂载的好方法(Is there a good way to detect

2019-09-02 04:56发布

站内文章 / 前沿技术

41 0

该账号已被封号

女 | 书童

私信

我有一个程序我想开始只有几个测试成功完成。

一个测试我需要的是我所有的NFS挂载的是活得很好。

我可以做的比蛮力方法更好：

mount | sed -n "s/^.* on \(.*\) type nfs .*$/\1/p" | 
while read mount_point ; do 
  timeout 10 ls $mount_point >& /dev/null || echo "stale $mount_point" ; 
done

这里timeout是将在后台运行的命令，一定时间后会杀了它，如果没有一个实用SIGCHLD被之前的时限抓，返回成功/失败的明显的方式。

在英语：解析输出mount ，检查（超时界）每个NFS挂载点。任选地（未在上面的代码）打破第一陈旧安装。

Answer 1:

你可以写一个C程序，并检查ESTALE 。

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <iso646.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>

int main(){
    struct stat st;
    int ret;
    ret = stat("/mnt/some_stale", &st);
    if(ret == -1 and errno == ESTALE){
        printf("/mnt/some_stale is stale\n");
        return EXIT_SUCCESS;
    } else {
        return EXIT_FAILURE;
    }
}

Answer 2:

我的一位同事跑进你的脚本。这不避免“蛮力”的做法，但如果我可以在击：

while read _ _ mount _; do 
  read -t1 < <(stat -t "$mount") || echo "$mount timeout"; 
done < <(mount -t nfs)

mount可以直接列出NFS挂载。 read -t （壳内建）可以超时的命令。 stat -t （扼要输出）仍然挂起像ls *。 ls产不必要的输出，风险误报的巨大/慢目录列表，并要求访问权限的-这也将触发误报，如果没有他们。

while read _ _ mount _; do 
  read -t1 < <(stat -t "$mount") || lsof -b 2>/dev/null|grep "$mount"; 
done < <(mount -t nfs)

我们使用它与lsof -b （非阻塞，这样就不会挂过多），以确定挂起的来源。

感谢您的指针！

test -d （壳内置）会工作，而不是stat （标准外）为好，但read -t只有当它没有超时和读取输入行返回成功。由于test -d不使用标准输出，一个(( $? > 128 ))错误级别检查，有必要-不值得可读性命中，IMO。

Answer 3:

我花了一些时间，但这里是我发现在Python其中的工作原理：

import signal, os, subprocess
class Alarm(Exception):
    pass

def alarm_handler(signum, frame):
    raise Alarm

pathToNFSMount = '/mnt/server1/' # or you can implement some function 
                                 # to find all the mounts...

signal.signal(signal.SIGALRM, alarm_handler)
signal.alarm(3)  # 3 seconds
try:
    proc = subprocess.call('stat '+pathToNFSMount, shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE) 
    stdoutdata, stderrdata = proc.communicate()
    signal.alarm(0)  # reset the alarm
except Alarm:
    print "Oops, taking too long!"

备注：

贷到这里的答案。
你也可以使用替代方案：
os.fork()和os.stat()

检查完叉，如果超时，你可以杀死它。您将需要一起工作time.time()等。

Answer 4:

除了以前的答案，这在某些情况下挂起，这片段检查所有合适的坐骑，杀死与信号KILL，并与CIFS过测试：

grep -v tracefs /proc/mounts | cut -d' ' -f2 | \
  while read m; do \
    timeout --signal=KILL 1 ls -d $m > /dev/null || echo "$m"; \
  done

Answer 5:

书面检查ESTALE一个C程序是，如果你不介意等待命令，因为陈旧的文件系统完成一个不错的选择。如果你想实现一个“超时”选项，我发现来实现它（在C程序），最好的办法是到餐桌，试图打开该文件一个子进程。然后，检查子进程结束的时间分配量中的文件系统成功地读取文件。

这里是概念C程序来做到这一点的一个小的证明：

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/wait.h>


void readFile();
void waitForChild(int pid);


int main(int argc, char *argv[])
{
  int pid;

  pid = fork();

  if(pid == 0) {
    // Child process.
    readFile();
  }
  else if(pid > 0) {
    // Parent process.
    waitForChild(pid);
  }
  else {
    // Error
    perror("Fork");
    exit(1);
  }

  return 0;
}

void waitForChild(int child_pid)
{
  int timeout = 2; // 2 seconds timeout.
  int status;
  int pid;

  while(timeout != 0) {
    pid = waitpid(child_pid, &status, WNOHANG);
    if(pid == 0) {
      // Still waiting for a child.
      sleep(1);
      timeout--;
    }
    else if(pid == -1) {
      // Error
      perror("waitpid()");
      exit(1);
    }
    else {
      // The child exited.
      if(WIFEXITED(status)) {
        // Child was able to call exit().
        if(WEXITSTATUS(status) == 0) {
          printf("File read successfully!\n");
          return;
        }
      }
      printf("File NOT read successfully.\n");
      return;
    }
  }

  // The child did not finish and the timeout was hit.
  kill(child_pid, 9);
  printf("Timeout reading the file!\n");
}

void readFile()
{
  int fd;

  fd = open("/path/to/a/file", O_RDWR);
  if(fd == -1) {
    // Error
    perror("open()");
    exit(1);
  }
  else {
    close(fd);
    exit(0);
  }
}

Answer 6:

我写https://github.com/acdha/mountstatus它采用类似于UndeadKernel提到的方法，我已经发现是最可靠的方法：这是一个守护进程通过派生一个子进程周期性地扫描所有安装的文件系统，其试图列出顶级目录和SIGKILL如果失败在一定的超时响应，以记录到系统日志成功和失败的。这避免了某些客户端实现（如老版本的Linux），它永远不会触发某些类型的错误，NFS服务器，这是部分反应，但如不实际通话等回答超时问题listdir ，等等。

我不公布他们，但包含的Makefile使用fpm建立RPM和deb包与一个暴发户脚本。

Answer 7:

另一种方法，使用shell脚本。作品对我好：

#!/bin/bash
# Purpose:
# Detect Stale File handle and remove it
# Script created: July 29, 2015 by Birgit Ducarroz
# Last modification: --
#

# Detect Stale file handle and write output into a variable and then into a file
mounts=`df 2>&1 | grep 'Stale file handle' |awk '{print ""$2"" }' > NFS_stales.txt`
# Remove : ‘ and ’ characters from the output
sed -r -i 's/://' NFS_stales.txt && sed -r -i 's/‘//' NFS_stales.txt && sed -r -i 's/’//' NFS_stales.txt

# Not used: replace space by a new line
# stales=`cat NFS_stales.txt && sed -r -i ':a;N;$!ba;s/ /\n /g' NFS_stales.txt`

# read NFS_stales.txt output file line by line then unmount stale by stale.
#    IFS='' (or IFS=) prevents leading/trailing whitespace from being trimmed.
#    -r prevents backslash escapes from being interpreted.
#    || [[ -n $line ]] prevents the last line from being ignored if it doesn't end with a \n (since read returns a non-zero exit code when it encounters EOF).

while IFS='' read -r line || [[ -n "$line" ]]; do
    echo "Unmounting due to NFS Stale file handle: $line"
    umount -fl $line
done < "NFS_stales.txt"
#EOF

文章来源: Is there a good way to detect a stale NFS mount