I extracted text from a pdf using both Apache PDFbox and iText. But both the extracted text are completely unstructured and messy
This is
but the extracted text is ::
111111 1111111111111111111111111111111111111111111111111111111111111
US008631488B2
(12) United States Patent (10) Patent No.: US 8,631,488 B2
Oz et al.
(45) Date of Patent: Jan. 14,2014
6,813,682 B2 1112004 Bress et al.
(54) SYSTEMS AND METHODS FOR PROVIDING
7,065,644 B2 Daniell et al.
6/2006
SECURITY SERVICES DURING POWER
Todd et al.
7,076,690 Bl 7/2006
MANAGEMENT MODE
7,086,089 B2 8/2006 Hrastar et al.
7,184,554 B2 2/2007 Freese
(75) Inventors: Ami Oz, Azur (IL); Shlomo Touboul,
7,283,542 B2
10/2007 Mitchell
7,353,533 B2 Wright et al.
Kefar Haim (IL) 4/2008
Maufer et al.
7,359,983 Bl 4/2008
7,360,242 B2 4/2008 Syvanne
(73) Assignee: CUPP Computing AS, Bergen (NO)
7,418,253 B2 8/2008 Kavanagh
(Continued)
Notice: Subject to any disclaimer, the term of this
( * )
patent is extended or adjusted under 35
FOREIGN PATENT DOCUMENTS
U.S.c. 154(b) by 656 days. wo 2000078008 12/2000
Appl. No.: 12/535,650
(21)
WO 2004030308 4/2004
(22) Filed: Aug. 4, 2009
OTHER PUBLICATIONS
Breeden H, John et al., "A Hardware FirewallYou TakeWithYou,"
(65) Prior Publication Data
Government Computer News, located at http:/gcn.com!Articles/
US 2010/0037321 Al Feb. 11,2010
2005/06/0 11A-hardware-firewall-you-take-with-you.aspx?p~1, Jun.
1,2005.
Why this happening ? How to solve this ?