I've trained a CRF using GenericAcrfTui
, it writes an ACRF
to a file. I'm not quite sure how to load and use the trained CRF but
import cc.mallet.grmm.learning.ACRF;
import cc.mallet.util.FileUtils;
ACRF c = (ACRF) FileUtils.readObject(Paths.get("acrf.ser.gz").toFile());
seems to work. However, the labeling seems incorrect and seems to rely on the labels that I pass as input. How do I label using a loaded ACRF?
Here's how I do my labeling:
GenericAcrfData2TokenSequence instanceMaker = new GenericAcrfData2TokenSequence();
instanceMaker.setDataAlphabet(c.getInputAlphabet());
instanceMaker.setIncludeTokenText(true);
instanceMaker.setFeaturesIncludeToken(true);
instanceMaker.setLabelsAtEnd(false);
Pipe pipe = new SerialPipes(new Pipe[] {
instanceMaker,
new TokenSequence2FeatureVectorSequence(c.getInputAlphabet(),
true, false),
});
InstanceList testing = new InstanceList(pipe);
Iterator<Instance> testSource = new LineGroupIterator(
// initialize the labels to O
new StringReader("O O ---- what W=the@1 W=hell@2\n"
+ "O O ---- the W=what@-1 W=hell@1\n"
+ "O O ---- hell W=what@-2 W=the@-1"),
Pattern.compile("^\\s*$"), true);
testing.addThruPipe(testSource);
System.out.println(c.getBestLabels(testing.get(0)));
I got that by looking at GenericAcrfTui
.
Some things I tried:
- When I tried giving different initial labels (other than "O"), then the resulting labeling changed but this doesn't help because I can't guess what labels to give initially, otherwise I wouldn't need a tagger.
- I tried not giving any initial labels at all but that just caused exceptions, it seems that Mallet really wants those labels.
I noticed that there's also the SimpleTagger
that can be used to train a CRF
but I think that I will still have the same problem using that to label new input.
Any help with labeling using a CRF from SimpleTagger
or GenericAcrfTui
would help.
BTW I usually use CRF++ but for this task, I want to build my own graph because I'm using dependency parse features.
I figured it out!
The problem was that the pipe didn't know the target alphabet. The solution is to use the CRF's
Pipe
, like so:instead of doing that crazyness to make my own
Pipe
.Now if anyone knows a nicer way to make a new
Instance
using a query, that would be good too, I just copied what the trainer does.