Running HtmlUnit with Jython - issue with startup

2020-07-22 07:19发布

问题:

I tried to run HtmlUnit with Jython following this tutorial:

http://blog.databigbang.com/web-scraping-ajax-and-javascript-sites/

but it does not work for me. I am unable to import the com.gargoylesoftvare packages, there are only some HTML files in HtmlUnit folder, which I need to import somehow?

The tutorial says to run python script like this:

/opt/jython/jython -J-classpath "htmlunit-2.8/lib/*" gartner.py

and I try to run:

java -jar /Users/adam/jython/jython.jar -J-classpath "htmlunit-2.8/lib/*" gartner.py

My problem is I am getting an "Unknown option: J-classpath". But there is not even word about -J-classpath parameter on Jython.org. I would be VERY glad for any advice. I am running jython standalone v. 2.5.2 on Snow Leopard

回答1:

Your entire command line is being processed by the java command (as it should), and -J-classpath is indeed not a valid command line option for java. You should really try to follow the exact steps of the tutorial, because you are missing several important steps (and kind of making up your own steps).



回答2:

It is possible to run a Jython script as: jython myscript.py if the script appends the full url to the python path using sys.path.append of the jars that a script will require to run.

Here is a current script I'm working on.

#!/opt/jython/jython
'''
Created on Dec 7, 2011
@author: chris
'''
import sys, os
from time import sleep

jarpath = '/usr/share/java/htmlunit/' #path the jar files to import
jars = ['apache-mime4j-0.6.jar','commons-codec-1.4.jar',
    'commons-collections-3.2.1.jar','commons-io-1.4.jar',
    'commons-lang-2.4.jar','commons-logging-1.1.1.jar',
    'cssparser-0.9.5.jar','htmlunit-2.8.jar',
    'htmlunit-core-js-2.8.jar','httpclient-4.0.1.jar',
    'httpcore-4.0.1.jar','httpmime-4.0.1.jar',
    'nekohtml-1.9.14.jar','sac-1.3.jar',
    'serializer-2.7.1.jar','xalan-2.7.1.jar',
    'xercesImpl-2.9.1.jar','xml-apis-1.3.04.jar'] #a list of jars

def loadjars(): #appends jars to jython path
    for jar in jars:
        print(jarpath+jar+'\n')
        container = jarpath+jar
        sys.path.append(container)

loadjars()

import com.gargoylesoftware.htmlunit.WebClient as WebClient
webclient = WebClient()   

def gotopage():
    print('hello, I will visit Google')
    url = 'http://google.com'
    page = webclient.getPage(url)
    print(page)    

if __name__ == "__main__":
    gotopage()


回答3:

I have met such error before, and do these steps i solve it successfully.

  1. download jython and run java -jar python-installer-xxx.jarto install jython, then you can put jython/bin folder to your system path, run jython in command line to ensure it's ok.
  2. download htmlunit jar files in sourceforge and you need to specific its location.
  3. write your .py file and run

    jython -J-classpath "/Users/crabime/Development Folder/htmlunit-2.23/lib/*" /Users/crabime/PycharmProjects/scrapimage/crabime/gartner.py

everything will ok,if you still miss module not found, maybe you should check your input command type error.