在Java中使用槌在LDA折叠在（新文件估计主题）(Folding in (estimating t

我使用槌通过Java，我不能工作，如何评价一个我已经训练现有的主题模型，新的文件。

我最初的代码来生成我的模型是非常相似的是，在马莱开发人员指南为主题造型，在这之后我只是模型保存为Java对象。在以后的过程中，我重新加载，从文件的Java对象，通过添加新的实例.addInstances()然后想评估只针对原来的训练集中找到主题这些新的实例。

这stats.SE线程提供了一些高层次的建议，但我看不出他们工作到槌框架。

任何帮助非常赞赏。

Answer 1:

我发现藏在答案从马利特的首席开发人员幻灯片甲板：

TopicInferencer inferencer = model.getInferencer();
double[] topicProbs = inferencer.getSampledDistribution(newInstance, 100, 10, 10);

Answer 2:

推论是居然还列出了例如链接的问题（最后几行）提供。

对于任何有兴趣的保存/加载训练模型，然后用它来推断新文档模型分布整个代码 - 这里有一些片段：

之后model.estimate()已经完成，你有实际的训练模型，因此您可以使用标准的Java序列化ObjectOutputStream （因为ParallelTopicModel实现Serializable ）：

try {
    FileOutputStream outFile = new FileOutputStream("model.ser");
    ObjectOutputStream oos = new ObjectOutputStream(outFile);
    oos.writeObject(model);
    oos.close();
} catch (FileNotFoundException ex) {
    // handle this error
} catch (IOException ex) {
    // handle this error
}

但是请注意，当推断，你也需要通过新的句子（如Instance通过相同管道），以便预先处理它（tokenzie等），因此，您还需要保存管列表（因为我们是使用SerialPipe时，可以创建一个实例，然后序列化）：

// initialize the pipelist (using in model training)
SerialPipes pipes = new SerialPipes(pipeList);

try {
    FileOutputStream outFile = new FileOutputStream("pipes.ser");
    ObjectOutputStream oos = new ObjectOutputStream(outFile);
    oos.writeObject(pipes);
    oos.close();
} catch (FileNotFoundException ex) {
    // handle error
} catch (IOException ex) {
    // handle error
}

为了加载模型/管道，并将其用于推断我们需要反序列化：

private static void InferByModel(String sentence) {
    // define model and pipeline
    ParallelTopicModel model = null;
    SerialPipes pipes = null;

    // load the model
    try {
        FileInputStream outFile = new FileInputStream("model.ser");
        ObjectInputStream oos = new ObjectInputStream(outFile);
        model = (ParallelTopicModel) oos.readObject();
    } catch (IOException ex) {
        System.out.println("Could not read model from file: " + ex);
    } catch (ClassNotFoundException ex) {
        System.out.println("Could not load the model: " + ex);
    }

    // load the pipeline
    try {
        FileInputStream outFile = new FileInputStream("pipes.ser");
        ObjectInputStream oos = new ObjectInputStream(outFile);
        pipes = (SerialPipes) oos.readObject();
    } catch (IOException ex) {
        System.out.println("Could not read pipes from file: " + ex);
    } catch (ClassNotFoundException ex) {
        System.out.println("Could not load the pipes: " + ex);
    }

    // if both are properly loaded
    if (model != null && pipes != null){

        // Create a new instance named "test instance" with empty target 
        // and source fields note we are using the pipes list here
        InstanceList testing = new InstanceList(pipes);   
        testing.addThruPipe(
            new Instance(sentence, null, "test instance", null));

        // here we get an inferencer from our loaded model and use it
        TopicInferencer inferencer = model.getInferencer();
        double[] testProbabilities = inferencer
                   .getSampledDistribution(testing.get(0), 10, 1, 5);
        System.out.println("0\t" + testProbabilities[0]);
    }
}

出于某种原因，我没有得到与加载的模型与原来的完全一样的推论 - 但这是另外一个问题一个问题（如果有人知道，虽然，我很高兴听到）