monitoring for changes in file(s) in real

2019-01-13 13:50发布

I have a program that monitors certain files for change. As soon as the file gets updated, the file is processed. So far I've come up with this general approach of handing "real time analysis" in R. I was hoping you guys have other approaches. Maybe we can discuss their advantages/disadvantages.

monitor <- TRUE
start.state <- file.info$mtime # modification time of the file when initiating

while(monitor) {
  change.state <- file.info$mtime
  if(start.state < change.state) {
    #process
  } else {
    print("Nothing new.")
  }
  Sys.sleep(sleep.time)
}

5条回答
啃猪蹄的小仙女
2楼-- · 2019-01-13 14:24

Similar to the suggestion to use a system API, this can be also done using qtbase (https://r-forge.r-project.org/R/?group_id=454) which will be a cross-platform means from within R:

dir_to_watch <- "/tmp"

library(qtbase)
fsw <- Qt$QFileSystemWatcher()
fsw$addPath(dir_to_watch)

id <- qconnect(fsw, "directoryChanged", function(path) {
  message(sprintf("directory %s has changed", path))
})

cat("abc", file="/tmp/deleteme.txt")
查看更多
Explosion°爆炸
3楼-- · 2019-01-13 14:29

If your system provides an API for monitoring filesystem changes, then you should use that. I believe Macs come with this. Not sure about other platforms though.

Edit: A quick goog gave me:

Linux - http://wiki.linuxquestions.org/wiki/FAM

Win32 - http://msdn.microsoft.com/en-us/library/aa364417(VS.85).aspx

Obviously, these APIs will eliminate any polling that you require. On the other hand, they may not always be available.

Java has this: http://jnotify.sourceforge.net/ and http://java.sun.com/developer/technicalArticles/javase/nio/#6

查看更多
狗以群分
4楼-- · 2019-01-13 14:37

I have a hack in mind: you can setup a CRON job/Scheduled task to run R script every n seconds (or whatever). R script checks the file hash, and if hashes don't match, runs the analysis. You can use digest::digest function, just check out the manual.

查看更多
看我几分像从前
5楼-- · 2019-01-13 14:38

You could use the tclTaskSchedule function in the tcltk2 package to set up a function that checks for updates and runs your code. This would then be run on a regular basis (you set the timing) but would still allow you to use your R session.

查看更多
趁早两清
6楼-- · 2019-01-13 14:44

If you have lots of files that you want to monitor, then R may be too slow for this purpose. Go to your c: or / dir and see how long it takes to do file.info(dir(recursive = TRUE)). A dos or bash script may be quicker.

Otherwise, the code looks fine.

查看更多
登录 后发表回答