-->

Access WordNet dict files in Android app

2019-06-21 21:51发布

问题:

I'm writing a word game in Android. It's my first app so my knowledge is almost non-existent.

What I would like to do is use JWI to access the WordNet dictionary. This requires specifying the WordNet dictionary's file path.

From what I've read, Android "assets" are not available via a simple file path, but what JWI requires to initialize the WordNet dictionary API is a URL to the disk location of the dictionary files.

So, what is the best course of action? Should I copy the assets at startup-time into a known folder on the android device? I can't think of a better way but that seems entirely stupid to me.

Any help gratefully received.

回答1:

I have the same problem (for a jetty webapp however and not android) and tried those two approaches, however unsuccessfully:

JWNL.initialize(this.getClass().getClassLoader().getResourceAsStream("wordnet_properties.xml");
dict = Dictionary.getInstance();

Here it successfully loads wordnet_properties.xml but it cannot access the dictionary which is pointed to by the properties file.

Using the dictionary folder directly:

String dictPath = "models/en/wordnet/dict/";
URL url = this.getClass().getClassLoader().getResource(dictPath);
System.out.println("loading wordnet from "+url);
dict = new RAMDictionary(url, ILoadPolicy.NO_LOAD);

Here I get the dictionary URL to be jar:file:/home/myusername/.m2/repository/package/1.0-SNAPSHOT/commons-1.0-SNAPSHOT.jar!/models/en/wordnet/dict/. WordNet however doesn't accept the jar protocol and gives me the error:

java.lang.IllegalArgumentException: URL source must use 'file' protocol
    at edu.mit.jwi.data.FileProvider.toFile(FileProvider.java:693)
    at edu.mit.jwi.data.FileProvider.open(FileProvider.java:304)
    at edu.mit.jwi.DataSourceDictionary.open(DataSourceDictionary.java:92)
    at edu.mit.jwi.RAMDictionary.open(RAMDictionary.java:216)

My next investigation will be to create a subclass to RAMDictionary or something similar, please tell me if you have found a solution in the meantime.

P.S.: I just wrote the developer a mail asking for help after I tried to rewrite the FileProvider to use resources instead but after one or two hours I gave up because the code calls so much other code that also only works with files. I will keep you up to date!

P.P.S.: I received an answer from the developer saying that it is principially not possible with streams because they don't offer random access which is necessary. However, he offered to implement a solution to load it all in RAM, if really necessary, but that would use up about 500 MB and I guess that is too much for android apps so I guess it is still best to unpack it somewhere.

P.S.: Here is my unpacking solution (you can replace the System.out.println statements with logger statements if you use logging or remove them if you don't like them):

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URISyntaxException;
import java.util.Enumeration;
import java.util.jar.JarEntry;
import java.util.jar.JarFile;

/** Allows WordNet to be run from within a jar file by unpacking it to a temporary directory.**/
public class WordNetUnpacker
{
    static final String ID = "178558556719"; // minimize the chance of interfering  with an existing directory  
    static final String jarDir = "models/en/wordnet/dict";

    /**If running from within a jar, unpack wordnet from the jar to a temp directory (if not already done) and return that.
     * If not running from a jar, just return the existing wordnet directory.
     * @see getUnpackedWordNetDir(Class)*/
    static File getUnpackedWordNetDir() throws IOException
    {return getUnpackedWordNetDir(WordNetUnpacker.class);}

    /**If running from within a jar, unpack wordnet from the jar to a temp directory (if not already done) and return that.
     * If not running from a jar, just return the existing wordnet directory.
     * @param clazz the class in whose classloader the wordnet resources are found.
     * @see getUnpackedWordNetDir()**/

    static File getUnpackedWordNetDir(Class clazz) throws IOException
    {
        String codeSource = clazz.getProtectionDomain().getCodeSource().getLocation().getPath();
        System.out.println("getUnpackedWordNetDir: using code source "+codeSource);
        if(!codeSource.endsWith(".jar"))
        {
            System.out.println("not running from jar, no unpacking necessary");
            try{return new File(WordNetUnpacker.class.getClassLoader().getResource(jarDir).toURI());}
            catch (URISyntaxException e) {throw new IOException(e);}
        }
        try(JarFile jarFile = new JarFile(codeSource))
        {
            String tempDirString = System.getProperty("java.io.tmpdir");
            if(tempDirString==null) {throw new IOException("java.io.tmpdir not set");}
            File tempDir = new File(tempDirString);
            if(!tempDir.exists()) {throw new IOException("temporary directory does not exist");}
            if(!tempDir.isDirectory()) {throw new IOException("temporary directory is a file, not a directory ");}
            File wordNetDir = new File(tempDirString+'/'+"wordnet"+ID);
            wordNetDir.mkdir();
            System.out.println("unpacking jarfile "+jarFile.getName());
            copyResourcesToDirectory(jarFile, jarDir, wordNetDir.getAbsolutePath());
            return wordNetDir;
        }       
    }
    /** Copies a directory from a jar file to an external directory. Copied from <a href="http://stackoverflow.com/a/19859453/398963">Stack Overflow</a>. */
    public static void copyResourcesToDirectory(JarFile fromJar, String jarDir, String destDir) throws IOException
    {
        int copyCount = 0;
        for (Enumeration<JarEntry> entries = fromJar.entries(); entries.hasMoreElements();)
        {
            JarEntry entry = entries.nextElement();
            if(!entry.getName().contains("models")) continue;
            if (entry.getName().startsWith(jarDir) && !entry.isDirectory()) {
                copyCount++;
                File dest = new File(destDir + "/" + entry.getName().substring(jarDir.length() + 1));
                File parent = dest.getParentFile();
                if (parent != null) {
                    parent.mkdirs();
                }

                FileOutputStream out = new FileOutputStream(dest);
                InputStream in = fromJar.getInputStream(entry);

                try {
                    byte[] buffer = new byte[8 * 1024];

                    int s = 0;
                    while ((s = in.read(buffer)) > 0) {
                        out.write(buffer, 0, s);
                    }
                } catch (IOException e) {
                    throw new IOException("Could not copy asset from jar file", e);
                } finally {
                    try {
                        in.close();
                    } catch (IOException ignored) {}
                    try {
                        out.close();
                    } catch (IOException ignored) {}
                }
            }
        }
        if(copyCount==0) System.out.println("Warning: No files copied!");
    }
}


回答2:

You can just copy all dict files from "assets" to the internal directory of your app. Just do it once, on the first app launch. Since then you can use JWI in a causual way like this:

String path = getFilesDir() + "/dict";
URL url = new URL("file", null, path);
IDictionary dict = new Dictionary(url);