Could you recommend a training path to start and become very good in Information Extraction. I started reading about it to do one of my hobby project and soon realized that I would have to be good at math (Algebra, Stats, Prob). I have read some of the introductory books on different math topics (and its so much fun). Looking for some guidance. Please help.
Update: Just to answer one of the comment. I am more interested in Text Information Extraction.
I disagree with the people who recommend reading Programming Collective Intelligence.If you want to do anything of even moderate complexity, you need to be good at applied math and PCI gives you a false sense of confidence . For example, when it talks of SVM, it just says that libSVM is a good way of implementing them. Now libSVM is definitely a good package but who cares about packages. What you need to know is why SVM gives the terrific results that it gives and how it is fundamentally different from Bayesian way of thinking ( and how Vapnik is a legend) .
IMHO , there is no one solution to it. You should have a good grip on Linear Algebra and probability and Bayesian theory . Bayes, i should add, is as important for this as oxygen for human beings ( its a little exaggerated but you get what i mean ,right ?) . Also, get a good grip on Machine Learning. Just using other people's work is perfectly fine but the moment you want to know why something was done the way it was, you will have to know something about ML.
Check these two for that :
http://pindancing.blogspot.com/2010/01/learning-about-machine-learniing.html
http://measuringmeasures.com/blog/2010/1/15/learning-about-statistical-learning.html
http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html
Okay,now thats three of them :) / Cool
Take a look here if you need enterprise grade NER service. Developing a NER system (and training sets) is a very time consuming and high skilled task.