I'm trying to use boilerpipe from JRuby. I've seen the guide for calling Java from JRuby, and have used it successfully with another Java package, but can't figure out why the same thing isn't working with boilerpipe.
I'm trying to basically do the equivalent of this Java from JRuby:
URL url = new URL("http://www.example.com/some-location/index.html");
String text = ArticleExtractor.INSTANCE.getText(url);
Tried this in JRuby:
require 'java'
url = java.net.URL.new("http://www.example.com/some-location/index.html")
text = Java::DeL3sBoilerpipeExtractors::ArticleExtractor.INSTANCE.getText(url)
This is based on the API Javadocs for boilerpipe. Here's the error:
jruby-1.6.0 :042 > Java::DeL3sBoilerpipeExtractors::ArticleExtractor
NameError: cannot load Java class deL3sBoilerpipeExtractors.ArticleExtractor
from org/jruby/javasupport/JavaClass.java:1195:in `for_name'
from org/jruby/javasupport/JavaUtilities.java:34:in `get_proxy_class'
from /usr/local/rvm/rubies/jruby-1.6.0/lib/ruby/site_ruby/shared/builtin/javasupport/java.rb:45:in `const_missing'
from (irb):42:in `evaluate'
from org/jruby/RubyKernel.java:1087:in `eval'
from /usr/local/rvm/rubies/jruby-1.6.0/lib/ruby/1.8/irb.rb:158:in `eval_input'
from /usr/local/rvm/rubies/jruby-1.6.0/lib/ruby/1.8/irb.rb:271:in `signal_status'
from /usr/local/rvm/rubies/jruby-1.6.0/lib/ruby/1.8/irb.rb:270:in `signal_status'
from /usr/local/rvm/rubies/jruby-1.6.0/lib/ruby/1.8/irb.rb:155:in `eval_input'
from org/jruby/RubyKernel.java:1417:in `loop'
from org/jruby/RubyKernel.java:1190:in `catch'
from /usr/local/rvm/rubies/jruby-1.6.0/lib/ruby/1.8/irb.rb:154:in `eval_input'
from /usr/local/rvm/rubies/jruby-1.6.0/lib/ruby/1.8/irb.rb:71:in `start'
from org/jruby/RubyKernel.java:1190:in `catch'
from /usr/local/rvm/rubies/jruby-1.6.0/lib/ruby/1.8/irb.rb:70:in `start'
from /usr/local/rvm/rubies/jruby-1.6.0/bin/irb:17:in `(root)'
Looks like it didn't parse the camelcase into the appropriate Java package name. What am I doing wrong? I believe I've set up my classpath alright (last 3 entries), though there may be some conflict with xerces possibly being included twice:
$ echo $CLASSPATH
:/jellly/Maui1.2:/jellly/Maui1.2/src:/jellly/Maui1.2/bin:/jellly/Maui1.2/lib/commons-io-1.4.jar:/jellly/Maui1.2/lib/commons-logging.jar:/jellly/Maui1.2/lib/icu4j_3_4.jar:/jellly/Maui1.2/lib/iri.jar:/jellly/Maui1.2/lib/jena.jar:/jellly/Maui1.2/lib/maxent-2.4.0.jar:/jellly/Maui1.2/lib/mysql-connector-java-3.1.13-bin.jar:/jellly/Maui1.2/lib/opennlp-tools-1.3.0.jar:/jellly/Maui1.2/lib/snowball.jar:/jellly/Maui1.2/lib/trove.jar:/jellly/Maui1.2/lib/weka.jar:/jellly/Maui1.2/lib/wikipediaminer1.1.jar:/jellly/Maui1.2/lib/xercesImpl.jar:/jellly/boilerpipe-1.1.0/boilerpipe-1.1.0.jar:/jellly/boilerpipe-1.1.0/lib/nekohtml-1.9.13.jar:/jellly/boilerpipe-1.1.0/lib/xerces-2.9.1.jar