Use tika with python, runtimeerror: unable to star

2020-02-06 06:03发布

I am trying to use the tika package to Parse files. Tika is successfully installed, tika-server-1.18.jar runned with Code in cmd Java -jar tika-server-1.18.jar

My code in the Jupyter is:

Import tika 
from tika Import parser
parsed = parser.from_file('')

However, I receive below error:

2018-07-25 10:20:13,325 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:18,329 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:23,332 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:28,340 [MainThread ] [ERROR] Tika startup log message not received after 3 tries. 2018-07-25 10:20:28,340 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer.

RuntimeError: Unable to start Tika Server.

3条回答
Deceive 欺骗
2楼-- · 2020-02-06 06:40

You have not passed an argument (specified a file) in your line:

parsed = parser.from_file('')

Give it a file to chew on e.g.,

parsed = parser.from_file('myfile.txt')

The server didn't start & presumably this no log warning gets triggered - see line 644 in the source at the Github

then another error message tells you it ain't going to play...

查看更多
走好不送
3楼-- · 2020-02-06 06:46

Download Java. If you already have a version of Java installed, try updating it to the latest version. The version that works for me is 1.18.

查看更多
Bombasti
4楼-- · 2020-02-06 06:47

According to Apache Tika's site, all new versions of the tika-server.jar will require Java 8.

24 April 2018: Apache Tika Release Apache Tika 1.18 has been released! This release includes bug fixes (e.g. extraction from grouped shapes in PPT), security fixes and upgrades to dependencies. PLEASE NOTE: The next versions will require Java 8. Please see the CHANGES.txt file for the full list of changes in the release and have a look at the download page for more information on how to obtain Apache Tika 1.18.

Current outdated docs for tika Python library claim that Java 7 is needed, but now Java 8 must be installed. This is because the current version of tika-server.jar is automatically downloaded at runtime if not found in your temp file.

After installing Java 8, my basic test code launched the server and worked without error.

查看更多
登录 后发表回答