asynchronous file IO in Tcl

2019-08-14 07:56发布

I have a C function that I have wrapped in Tcl that opens a file, reads the contents, performs an operation, and returns a value Unfortunately, when I call the function to open a large file, it blocks the event loop. The OS is linux.

I'd like to make the calls asynchronous. How do I do so?

(I can pass the work to another Tcl thread, but that's not exactly what I want).

2条回答
走好不送
2楼-- · 2019-08-14 08:43

Tcl does support asynchronous I/O on its channels (hence including files) using event-style (callback) approach.

The idea is to register a script as a callback for the so-called readable event on an opened channel set to a non-blocking mode and then in that script call read on the channel once, process the data read and then test for whether that read operation hit the EOF condition, in which case close the file.

Basically this looks like this:

set data ""
set done false

proc read_chunk fd {
  global data
  append data [read $fd]
  if {[eof $fd]} {
    close $fd
    set ::done true
  }
}

set fd [open file]
chan configure $fd -blocking no
chan event $fd readable [list read_chunk $fd]
vwait ::done

(Two points: a) In case of Tcl ≤ 8.5 you'll have to use fconfigure instead of chan configure and fileevent instead of chan event; b) If you're using Tk you don't need vwait as Tk already forces the Tcl event loop to run).

Note one caveat though: if the file you're reading is located on a physically attached fast medium (like rotating disk, SSD etc) it will be quite highly available which means the Tcl's event loop will be saturated with the readable events on your file and the overall user experience will likely be worse than if you'd read it in one gulp because the Tk UI uses idle-priority callbacks for many of its tasks, and they won't get any chance to run until your file is read; in the end you'll have sluggish or frozen UI anyway and the file will be read slower (in the wall-clock time terms) compared to the case of reading it in a single gulp. There are two possible solutions:

  • Do use a separate thread.
  • Employ a hack which gives a chance for the idle-priority events to run — in your callback script for the readable event schedule execution of another callback script with the idle priority:

    chan event $fd readable [list after idle [list read_chunk $fd]]
    

    Obviously, this actually doubles the number of events piped through the Tcl event loop in response to the chunks of the file's data becoming "available" but in exchange it brings the priority of processing your file's data down to that of UI events.

You might also be tempted to just call update in your readable callback to force the event loop to process the UI event, — please don't.

There's yet another approach available since Tcl 8.6: coroutines. The chief idea is that instead of using events you interleave reading a file using reasonably small chunks with some other processing. Both tasks should be implemented as coroutines periodically yielding into each other thus creating a cooperative multitasking. Wiki has more info on this.

查看更多
我只想做你的唯一
3楼-- · 2019-08-14 08:46

This is quite difficult to do in general. The issue is that asynchronous file operations don't work very well with ordinary files due to the abstractions involved at the OS level. The best way around this — if you can — is to build an index over the file first so that you can avoid reading through it all and instead just seek to somewhere close to the data. This is the core of how a database works.

If you can't do that but you can apply a simple filter, putting that filter in a subprocess (pipes do work with asynchronous I/O in Tcl, and they do so on all supported platforms) or another thread (inter-thread messages are nice from an asynch processing perspective too) can work wonders.

Use the above techniques if you can. They're what I believe you should do.

If even that is impractical, you're going to have to do this the hard way. The hard way involves inserting event-loop-aware delays in your processing.

Introducing delays in 8.5 and before

In Tcl 8.5 and before, you do this by splitting your code up into several pieces in different procedures and using a stanza like this to pass control between them through a “delay”:

# 100ms delay, but tune it yourself
after 100 [list theNextProcedure $oneArgument $another]

This is continuation-passing style, and it can be rather tricky to get right. In particular, it's rather messy with complicated processing. For example, suppose you were doing a loop over the first thousand lines of a file:

proc applyToLines {filename limit callback} {
    set f [open $filename]
    for {set i 1} {$i <= $limit} {incr i} {
        set line [gets $f]
        if {[eof $f]} break
        $callback $i $line
    }
    close $f
}
applyToLines "/the/filename.txt" 1000 DoSomething

In classic Tcl CPS, you'd do this:

proc applyToLines {filename limit callback} {
    set f [open $filename]
    Do1Line $f 1 $limit $callback
}
proc Do1Line {f i limit callback} {
    set line [gets $f]
    if {![eof $f]} {
        $callback $i $line
        if {[incr i] <= $limit} {
            after 10 [list Do1Line $f $i $limit $callback]
            return
        }
    }
    close $f
}
applyToLines "/the/filename.txt" 1000 DoSomething

As you can see, it's not a simple transformation, and if you wanted to do something once the processing was done, you'd need to pass around a callback. (You could also use globals, but that's hardly elegant…)

(If you want help changing your code to work this was, you'll need to show us the code that you want help with.)

Introducing delays in 8.6

In Tcl 8.6, though the above code techniques will still work, you've got another option: coroutines! We can write this instead:

proc applyToLines {filename limit callback} {
    set f [open $filename]
    for {set i 1} {$i <= $limit} {incr i} {
        set line [gets $f]
        if {[eof $f]} break
        yield [after 10 [info coroutine]]
        $callback $i $line
    }
    close $f
}
coroutine ApplyToAFile applyToLines "/the/filename.txt" 1000 DoSomething

That's almost the same, except for the line with yield and info coroutine (which suspends the coroutine until it is resumed from the event loop in about 10ms time) and the line with coroutine ApplyToAFile, where that prefix creates a coroutine (with the given arbitrary name ApplyToAFile) and sets it running. As you can see, it's not too hard to transform your code like this.

(There is no chance at all of a backport of the coroutine engine to 8.5 or before; it completely requires the non-recursive script execution engine in 8.6.)

查看更多
登录 后发表回答