I'm new to CRF++. I'm teaching myself looking at its manual: http://crfpp.googlecode.com/svn/trunk/doc/index.html?source=navbar#templ
And I don't understand what this means:
This is a template to describe unigram features. When you give a
template "U01:%x[0,1]", CRF++ automatically generates a set of feature
functions (func1 ... funcN) like:
func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func3 = if (output = O and feature="U01:DT") return 1 else return 0
.... funcXX = if (output = B-NP and feature="U01:NN") return 1 else return 0
funcXY = if (output = O and feature="U01:NN") return 1 else return 0. The number of feature functions generated by a template
amounts to (L * N), where L is the number of output
Why are there many lines for the Unigram features and what do they mean?
After looking at the documentation for long enough, I think I figured it out.
Take the example in the documentation where the input data is:
and the feature template (in the format
%x[row, col]
, whererow
is relative to your current position) in question is%x[0,1]
When
%x[0,1]
is expanded, depending on the current token, it could scan one of the strings inside the set[PRP, VBZ, DT, JJ, NN]
(i.e. one of the unique strings from the 1st column, where the leftmost column is column 0). For each of these strings it creates a set of feature functions of the form (looking at the 3rd row of input data):where that particular string (
DT
in the code above) is compared with every single output class.So if the output classes are
[B-NP, I-NP, O]
the feature template expanded into feature functions will look like:Regarding where the documentation mentions:
In this case L would be 3 and N would be 5.
For a particular template %x[i,j], i represents the offsets(row) to current position, j represents the feature(column) you want to use. Given data:
%x[0,1] refers to the word, offset to current word is 0, its pos tag is JJ and its output tag is I-NP.
Move farword, %x[0, 1] -> pos tag = NN, output tag = I-NP
Each feature function refers to a pair of possible values of the current word and its pos tag.
update:
I think explaination above is quite straight forward on condition that you understand CRF model well.
CRF Model Reference
CRF++ is a replication of Sha and Pereira (2003)