Scoping issue memoising bash function with associa

2019-09-04 02:35发布

问题:

I have a bash script which uses jq to look up 'dependency' data in some JSON, and take the closure (find dependencies of dependencies of dependencies, etc.).

This works fine, but can be very slow, since it may look up the same dependencies over and over, so I'd like to memoise it.

I tried using a global associative array to associate arguments with results, but the array doesn't seem to be storing anything.

I've extracted the relevant code into the following demo:

#!/usr/bin/env bash

# Associative arrays must be declared; use 'g' for global
declare -Ag MYCACHE
function expensive {
    # Look up $1 in MYCACHE
    if [ "${MYCACHE[$1]+_}" ]
    then
        echo "Using cached version" >> /dev/stderr
        echo "${MYCACHE[$1]}"
        return
    fi

    # Not found, perform expensive calculation
    RESULT="foo"

    echo "Caching result" >> /dev/stderr
    MYCACHE["$1"]="$RESULT"

    # Check if the result was cached
    if [ "${MYCACHE[$1]+_}" ]
    then
        echo "Cached" >> /dev/stderr
    else
        abort "Didn't cache"
    fi

    # Done
    echo "$RESULT"
}

function abort {
    echo "$1" >> /dev/stderr
    exit 1
}

# Run once, make sure result is "foo"
[[ "x$(expensive "hello")" = "xfoo" ]] ||
    abort "Failed for hello"

# Run again, make sure "Using cached version" is in stderr
expensive "hello" 2>&1 > /dev/null | grep "Using cached version" ||
    abort "Didn't use cache"

Here are my results:

$ ./foo.sh 
Caching result
Cached
Didn't use cache

The fact we get Cached seems to indicate that I'm storing and looking up values correctly, but they're not preserved across invocations of expensive since we hit the Didn't use cache branch.

It looks like a scoping issue to me, maybe caused by the declare. However, declare -A seems to be a requirement for using associative arrays.

Here's my bash version:

$ bash --version
GNU bash, version 4.3.42(1)-release (i686-pc-linux-gnu)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

As well as figuring out the behaviour I'm experiencing, I'd also appreciate alternative ways of memoising functions in bash (preferably nothing which touches the filesystem, even if it's RAM-based)

回答1:

You have a few issues:

  1. declare -g is only meaningful inside a function. Outside, a variable is already global.

  2. A global variable is only global to the process in which it is declared. You can't have global variables shared across processes.

  3. Running expensive inside a command substitution does so in a separate process, so the cache it creates and populates disappears with that process.

  4. Running expensive as the first command of a pipeline also creates a new process; the cache it uses is only visible to that process.

You can work around this by making sure expensive is only run in the current shell with

expensive "hello" > tmp.txt && read result < tmp.txt
[[ $foo = foo ]] || abort ...
expensive "hello" 2>&1 > /dev/null < <(grep "Using cached version") ||
abort "Didn't use cache"

Shell scripting, however, is simply not designed for this type of data processing. If caching is important, use a different language with better support for data structures and in-memory handling of data. Shell is optimized for starting new processes and managing input/output files.