I am trying to use the tika package to Parse files. Tika is successfully installed, tika-server-1.18.jar
runned with Code in cmd Java -jar tika-server-1.18.jar
My code in the Jupyter is:
Import tika
from tika Import parser
parsed = parser.from_file('')
However, I receive below error:
2018-07-25 10:20:13,325 [MainThread ] [WARNI] Failed to see startup
log message; retrying... 2018-07-25 10:20:18,329 [MainThread ]
[WARNI] Failed to see startup log message; retrying... 2018-07-25
10:20:23,332 [MainThread ] [WARNI] Failed to see startup log
message; retrying... 2018-07-25 10:20:28,340 [MainThread ] [ERROR]
Tika startup log message not received after 3 tries. 2018-07-25
10:20:28,340 [MainThread ] [ERROR] Failed to receive startup
confirmation from startServer.
RuntimeError: Unable to start Tika Server.
According to Apache Tika's site, all new versions of the tika-server.jar will require Java 8.
24 April 2018: Apache Tika Release
Apache Tika 1.18 has been released! This release includes bug fixes (e.g. extraction from grouped shapes in PPT), security fixes and upgrades to dependencies. PLEASE NOTE: The next versions will require Java 8. Please see the CHANGES.txt file for the full list of changes in the release and have a look at the download page for more information on how to obtain Apache Tika 1.18.
Current outdated docs for tika Python library claim that Java 7 is needed, but now Java 8 must be installed. This is because the current version of tika-server.jar is automatically downloaded at runtime if not found in your temp file.
After installing Java 8, my basic test code launched the server and worked without error.
You have not passed an argument (specified a file) in your line:
parsed = parser.from_file('')
Give it a file to chew on e.g.,
parsed = parser.from_file('myfile.txt')
The server didn't start & presumably this no log warning gets triggered - see line 644 in the source at the Github
then another error message tells you it ain't going to play...
Download Java. If you already have a version of Java installed, try updating it to the latest version. The version that works for me is 1.18.