I'm new to CRF++. I'm teaching myself looking at its manual:
http://crfpp.googlecode.com/svn/trunk/doc/index.html?source=navbar#templ
And I don't understand what this means:
This is a template to describe unigram features. When you give a
template "U01:%x[0,1]", CRF++ automatically generates a set of feature
functions (func1 ... funcN) like:
func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func3 = if (output = O and feature="U01:DT") return 1 else return 0
.... funcXX = if (output = B-NP and feature="U01:NN") return 1 else return 0
funcXY = if (output = O and feature="U01:NN") return 1 else return 0. The number of feature functions generated by a template
amounts to (L * N), where L is the number of output
Why are there many lines for the Unigram features and what do they mean?
After looking at the documentation for long enough, I think I figured it out.
Take the example in the documentation where the input data is:
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
and the feature template (in the format %x[row, col]
, where row
is relative to your current position) in question is %x[0,1]
When %x[0,1]
is expanded, depending on the current token, it could scan one of the strings inside the set [PRP, VBZ, DT, JJ, NN]
(i.e. one of the unique strings from the 1st column, where the leftmost column is column 0). For each of these strings it creates a set of feature functions of the form (looking at the 3rd row of input data):
func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func3 = if (output = O and feature="U01:DT") return 1 else return 0
...
where that particular string (DT
in the code above) is compared with every single output class.
So if the output classes are [B-NP, I-NP, O]
the feature template expanded into feature functions will look like:
# row 1 (He, PRP, B-NP)
func1 = if (output = B-NP and feature="U01:PRP") return 1 else return 0
func2 = if (output = I-NP and feature="U01:PRP") return 1 else return 0
func3 = if (output = O and feature="U01:PRP") return 1 else return 0
# row 2 (Reckons, VBZ, B-VP)
func4 = if (output = B-NP and feature="U01:VBZ") return 1 else return 0
func5 = if (output = I-NP and feature="U01:VBZ") return 1 else return 0
func6 = if (output = O and feature="U01:VBZ") return 1 else return 0
# Row 3 (the, DT, B-NP)
func7 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func8 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func9 = if (output = O and feature="U01:DT") return 1 else return 0
# Row 4 (current, JJ, I-NP)
func10 = if (output = B-NP and feature="U01:JJ") return 1 else return 0
func11 = if (output = I-NP and feature="U01:JJ") return 1 else return 0
func12 = if (output = O and feature="U01:JJ") return 1 else return 0
# Row 5 (account, NN, I-NP)
func13 = if (output = B-NP and feature="U01:NN") return 1 else return 0
func14 = if (output = I-NP and feature="U01:NN") return 1 else return 0
func15 = if (output = O and feature="U01:NN") return 1 else return 0
Regarding where the documentation mentions:
The number of feature functions generated by a template amounts to (L * N), where L is the number of output classes and N is the number of unique strings expanded from the given template.
In this case L would be 3 and N would be 5.
For a particular template %x[i,j], i represents the offsets(row) to current position, j represents the feature(column) you want to use.
Given data:
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP << CURRENT TOKEN
account NN I-NP
%x[0,1] refers to the word, offset to current word is 0, its pos tag is JJ and its output tag is I-NP.
Move farword, %x[0, 1] -> pos tag = NN, output tag = I-NP
Each feature function refers to a pair of possible values of the current word and its pos tag.
update:
I think explaination above is quite straight forward on condition that you understand CRF model well.
CRF Model Reference
CRF++ is a replication of Sha and Pereira (2003)