Resume/CV Parsing in PHP [closed]

2020-06-06 03:25发布

问题:

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 8 years ago.

We are developing a requirement base social media site using LAMP.

For that we want to do Resume/CV Parsing in PHP.

We were able to parse Email-id and Phone, but not sure how to parse the other information like full name, address, education, employment etc from the resume.

Plus resume/CV can be in various formats like doc,html,rtf,txt etc.

Anyone know abt the PHP script, where we can grab the data. or any development idea to kick start.

Thanks in advance.

回答1:

I would see if an existing resume parser has an API you can use or a custom hook you can add to your framework. Check out Sovren or TextKernel

According to Sovren's website, they quote:

Once your instance of the SovrenConvertAndParse Web Service is running, you will access it via SOAP. Almost all programming environments have the ability to auto-create a web service client or web service proxy automatically from the web service’s WSDL. We also have sample clients for some environments such as PHP. In any case, creating the web service client should be a very quick task: usually a few minutes, maybe a few hours.

Once you have created your web service client, you can call a single method on the web service to convert and parse a resume in one operation, receiving HR-XML output in return.

http://www.sovren.com/sovren-products-parser-implementation.php

http://www.sovren.com/sovren-products-web-service.php

TextKernel quotes:

Document processing for all types of documents (DOC, DOCX, PDF, RTF, HTML, TIFF, TXT, XML, MSG, and EML type documents). Textkernel offers the following 11 languages out of the box: English, German, French, Dutch, Spanish, Swedish, Danish, Polish, Romanian, Italian, Slovak.

It seems their web interface is called Sourcebox:

Sourcebox is fully configurable with Extract!, Textkernel's CV Parsing software.

Sourcebox has a multilingual web interface for staff to manage the CV queue and manually check and correct exceptions.

Sourcebox can be used as an interface to many leading CRM, ATS, Matching engines, HRMS systems and your own website or recruitment portal.

http://www.textkernel.com/hr_solutions.php?nav=sourcebox

They both seem promising enough to utilize and not have to reinvent the wheel here, especially with PHP.