I tried to run HtmlUnit with Jython following this tutorial:
http://blog.databigbang.com/web-scraping-ajax-and-javascript-sites/
but it does not work for me. I am unable to import the com.gargoylesoftvare packages, there are only some HTML files in HtmlUnit folder, which I need to import somehow?
The tutorial says to run python script like this:
/opt/jython/jython -J-classpath "htmlunit-2.8/lib/*" gartner.py
and I try to run:
java -jar /Users/adam/jython/jython.jar -J-classpath "htmlunit-2.8/lib/*" gartner.py
My problem is I am getting an "Unknown option: J-classpath". But there is not even word about -J-classpath parameter on Jython.org. I would be VERY glad for any advice. I am running jython standalone v. 2.5.2 on Snow Leopard
Your entire command line is being processed by the java
command (as it should), and -J-classpath is indeed not a valid command line option for java
. You should really try to follow the exact steps of the tutorial, because you are missing several important steps (and kind of making up your own steps).
It is possible to run a Jython script as: jython myscript.py if the script appends the full url to the python path using sys.path.append of the jars that a script will require to run.
Here is a current script I'm working on.
#!/opt/jython/jython
'''
Created on Dec 7, 2011
@author: chris
'''
import sys, os
from time import sleep
jarpath = '/usr/share/java/htmlunit/' #path the jar files to import
jars = ['apache-mime4j-0.6.jar','commons-codec-1.4.jar',
'commons-collections-3.2.1.jar','commons-io-1.4.jar',
'commons-lang-2.4.jar','commons-logging-1.1.1.jar',
'cssparser-0.9.5.jar','htmlunit-2.8.jar',
'htmlunit-core-js-2.8.jar','httpclient-4.0.1.jar',
'httpcore-4.0.1.jar','httpmime-4.0.1.jar',
'nekohtml-1.9.14.jar','sac-1.3.jar',
'serializer-2.7.1.jar','xalan-2.7.1.jar',
'xercesImpl-2.9.1.jar','xml-apis-1.3.04.jar'] #a list of jars
def loadjars(): #appends jars to jython path
for jar in jars:
print(jarpath+jar+'\n')
container = jarpath+jar
sys.path.append(container)
loadjars()
import com.gargoylesoftware.htmlunit.WebClient as WebClient
webclient = WebClient()
def gotopage():
print('hello, I will visit Google')
url = 'http://google.com'
page = webclient.getPage(url)
print(page)
if __name__ == "__main__":
gotopage()
I have met such error before, and do these steps i solve it successfully.
- download jython and run
java -jar python-installer-xxx.jar
to install jython, then you can put jython/bin
folder to your system path, run jython
in command line to ensure it's ok.
- download htmlunit jar files in sourceforge and you need to specific its location.
write your .py file and run
jython -J-classpath "/Users/crabime/Development Folder/htmlunit-2.23/lib/*" /Users/crabime/PycharmProjects/scrapimage/crabime/gartner.py
everything will ok,if you still miss module not found, maybe you should check your input command type error.