Boilerpipe is a library that basically extracts the main content from a webpage. For news websites, it is especially hard to extract the content as the formatting differs from site to site. So I've tried to integrate the boilerpipe library - https://code.google.com/p/boilerpipe/wiki/QuickStart
As per the installing guide, I have added the following to my Java classpath - boilerpipe-VERSION.jar, nekohtml-1.9.13.jar and xerces-2.9.1.jar
What I'm trying to do with boilerpipe and my application flow that involves it
I have a list view where there are a list of articles. I've set up an onItemClickListener such that when you click on any of the items on the listview, it takes the url specific to that article and uses boilerpipe to extract the text from that article and starts a new activity where it is printed in textview.
The Problem
My application crashes once I click on one of the items in the list. a. I'm not sure if the code I've written is correct, as I'm a beginner. Please excuse me for that. If it is incorrect, how can I fix it? I have a feeling it may be a problem with the url. b. If I haven't installed boilerplate correctly, what is the correct way of doing it
List Activity:
ListView lv = getListView();
// Launching new screen on Selecting Single ListItem
lv.setOnItemClickListener(new OnItemClickListener() {
public void onItemClick(AdapterView<?> parent, View view,
int position, long id) {
Intent in = new Intent(getApplicationContext(), ArticleActivity.class);
// getting page url
String page_url = ((TextView) view.findViewById(R.id.page_url)).getText().toString();
Toast.makeText(getApplicationContext(), page_url, Toast.LENGTH_SHORT).show();
URL url = null;
try {
url = new URL(page_url);
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// NOTE: Use ArticleExtractor unless DefaultExtractor gives better results for you
try {
String text = null;
text = ArticleExtractor.INSTANCE.getText(url);
in.putExtra("text", text);
} catch (BoilerpipeProcessingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
startActivity(in);
});
}
Article Activity:
public class ArticleActivity extends Activity{
Intent in = getIntent();
String text = in.getStringExtra("text");
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
this.requestWindowFeature(Window.FEATURE_NO_TITLE);
setContentView(R.layout.article_view);
TextView tv;
tv = (TextView) findViewById(R.id.page_url);
tv.setText(text);
}
}
article_view.xml
<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:orientation="vertical" >
<!-- Article Title -->
<TextView android:id="@+id/content_view"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:paddingTop="10dp"
android:paddingBottom="8dp"
android:textSize="18sp"
android:textStyle="bold"
android:textColor="#dc6800"/>
</RelativeLayout>
Stack Trace:
USER_COMMENT=null
ANDROID_VERSION=4.1.2
APP_VERSION_NAME=1.0
BRAND=samsung
PHONE_MODEL=GT-N8000
CUSTOM_DATA=
STACK_TRACE=android.os.NetworkOnMainThreadException
at android.os.StrictMode$AndroidBlockGuardPolicy.onNetwork(StrictMode.java:1118)
at java.net.InetAddress.lookupHostByName(InetAddress.java:385)
at java.net.InetAddress.getAllByNameImpl(InetAddress.java:236)
at java.net.InetAddress.getAllByName(InetAddress.java:214)
at libcore.net.http.HttpConnection.<init>(HttpConnection.java:70)
at libcore.net.http.HttpConnection.<init>(HttpConnection.java:50)
at libcore.net.http.HttpConnection$Address.connect(HttpConnection.java:340)
at libcore.net.http.HttpConnectionPool.get(HttpConnectionPool.java:87)
at libcore.net.http.HttpConnection.connect(HttpConnection.java:128)
at libcore.net.http.HttpEngine.openSocketConnection(HttpEngine.java:315)
at libcore.net.http.HttpEngine.connect(HttpEngine.java:310)
at libcore.net.http.HttpEngine.sendSocketRequest(HttpEngine.java:289)
at libcore.net.http.HttpEngine.sendRequest(HttpEngine.java:239)
at libcore.net.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:273)
at libcore.net.http.HttpURLConnectionImpl.getHeaderField(HttpURLConnectionImpl.java:130)
at java.net.URLConnection.getContentType(URLConnection.java:326)
at de.l3s.boilerpipe.sax.HTMLFetcher.fetch(HTMLFetcher.java:35)
at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:87)
at com.j.infographx.ListRSSItemsActivity$1.onItemClick(ListRSSItemsActivity.java:94)
at android.widget.AdapterView.performItemClick(AdapterView.java:301)
at android.widget.AbsListView.performItemClick(AbsListView.java:1287)
at android.widget.AbsListView$PerformClick.run(AbsListView.java:3078)
at android.widget.AbsListView$1.run(AbsListView.java:4161)
at android.os.Handler.handleCallback(Handler.java:615)
at android.os.Handler.dispatchMessage(Handler.java:92)
at android.os.Looper.loop(Looper.java:137)
at android.app.ActivityThread.main(ActivityThread.java:4921)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:511)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:1038)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:805)
at dalvik.system.NativeStart.main(Native Method)