how to get html content from a webview?

2018-12-31 08:04发布

Which is the simplest method to get html code from a webview? I have tried several methods from stackoverflow and google, but can't find an exact method. Please mention an exact way.

public class htmldecoder extends Activity implements OnClickListener,TextWatcher
{
TextView txturl;
Button btgo;
WebView wvbrowser;
TextView txtcode;
ImageButton btcode;
LinearLayout llayout;
int flagbtcode;
public void onCreate(Bundle savedInstanceState)
{
            super.onCreate(savedInstanceState);
                setContentView(R.layout.htmldecoder);

    txturl=(TextView)findViewById(R.id.txturl);

    btgo=(Button)findViewById(R.id.btgo);
    btgo.setOnClickListener(this);

    wvbrowser=(WebView)findViewById(R.id.wvbrowser);
    wvbrowser.setWebViewClient(new HelloWebViewClient());
    wvbrowser.getSettings().setJavaScriptEnabled(true);
    wvbrowser.getSettings().setPluginsEnabled(true);
    wvbrowser.getSettings().setJavaScriptCanOpenWindowsAutomatically(true);
    wvbrowser.addJavascriptInterface(new MyJavaScriptInterface(),"HTMLOUT");
    //wvbrowser.loadUrl("http://www.google.com");
    wvbrowser.loadUrl("javascript:window.HTMLOUT.showHTML('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");


    txtcode=(TextView)findViewById(R.id.txtcode);
    txtcode.addTextChangedListener(this);

    btcode=(ImageButton)findViewById(R.id.btcode);
    btcode.setOnClickListener(this);

    }

public void onClick(View v)
{
    if(btgo==v)
    {
        String url=txturl.getText().toString();
        if(!txturl.getText().toString().contains("http://"))
        {
            url="http://"+url;
        }
        wvbrowser.loadUrl(url);
        //wvbrowser.loadData("<html><head></head><body><div style='width:100px;height:100px;border:1px red solid;'></div></body></html>","text/html","utf-8");
    }
    else if(btcode==v)
    {
        ViewGroup.LayoutParams params1=wvbrowser.getLayoutParams();
        ViewGroup.LayoutParams params2=txtcode.getLayoutParams();
        if(flagbtcode==1)
        {
            params1.height=200;
            params2.height=220;
            flagbtcode=0;
            //txtcode.setText(wvbrowser.getContentDescription());
        }
        else
        {
            params1.height=420;
            params2.height=0;
            flagbtcode=1;
        }
        wvbrowser.setLayoutParams(params1);
        txtcode.setLayoutParams(params2);

    }
}

public class HelloWebViewClient extends WebViewClient {
    @Override
    public boolean shouldOverrideUrlLoading(WebView view, String url) {

        view.loadUrl(url);
        return true;
    }
    /*@Override
    public void onPageFinished(WebView view, String url)
    {
        // This call inject JavaScript into the page which just finished loading. 
        wvbrowser.loadUrl("javascript:window.HTMLOUT.processHTML('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");
    }*/

}
class MyJavaScriptInterface
{
    @SuppressWarnings("unused")
    public void showHTML(String html)
    {

        txtcode.setText(html);
    }
}

public void afterTextChanged(Editable s) {
    // TODO Auto-generated method stub

}

public void beforeTextChanged(CharSequence s, int start, int count,
        int after) {
    // TODO Auto-generated method stub

}

public void onTextChanged(CharSequence s, int start, int before, int count) {
    wvbrowser.loadData("<html><div"+txtcode.getText().toString()+"></div></html>","text/html","utf-8");

}

}

12条回答
一个人的天荒地老
2楼-- · 2018-12-31 08:09

try using HttpClient as Sephy said:

public String getHtml(String url) {
    HttpClient vClient = new DefaultHttpClient();
    HttpGet vGet = new HttpGet(url);
    String response = "";    

    try {
        ResponseHandler<String> vHandler = new BasicResponseHandler();
        response = vClient.execute(vGet, vHandler);
    } catch (Exception e) {
        e.printStackTrace();
    }
    return response;
}
查看更多
墨雨无痕
3楼-- · 2018-12-31 08:19

Actually this question has many answers. Here are 2 of them :

  • This first is almost the same as yours, I guess we got it from the same tutorial.

public class TestActivity extends Activity {

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.webview);
        final WebView webview = (WebView) findViewById(R.id.browser);
        webview.getSettings().setJavaScriptEnabled(true);
        webview.addJavascriptInterface(new MyJavaScriptInterface(this), "HtmlViewer");

        webview.setWebViewClient(new WebViewClient() {
            @Override
            public void onPageFinished(WebView view, String url) {
                webview.loadUrl("javascript:window.HtmlViewer.showHTML" +
                        "('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");
            }
        });

        webview.loadUrl("http://android-in-action.com/index.php?post/" +
                "Common-errors-and-bugs-and-how-to-solve-avoid-them");
    }

    class MyJavaScriptInterface {

        private Context ctx;

        MyJavaScriptInterface(Context ctx) {
            this.ctx = ctx;
        }

        public void showHTML(String html) {
            new AlertDialog.Builder(ctx).setTitle("HTML").setMessage(html)
                    .setPositiveButton(android.R.string.ok, null).setCancelable(false).create().show();
        }

    }
}

This way your grab the html through javascript. Not the prettiest way but when you have your javascript interface, you can add other methods to tinker it.


  • An other way is using an HttpClient like there.

The option you choose also depends, I think, on what you intend to do with the retrieved html...

查看更多
无色无味的生活
4楼-- · 2018-12-31 08:19

Android will not let you do this for security concerns. An evil developer could very easily steal user-entered login information.

Instead, you have to catch the text being displayed in the webview before it is displayed. If you don't want to set up a response handler (as per the other answers), I found this fix with some googling:

URL url = new URL("https://stackoverflow.com/questions/1381617");
URLConnection con = url.openConnection();
Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
Matcher m = p.matcher(con.getContentType());
/* If Content-Type doesn't match this pre-conception, choose default and 
 * hope for the best. */
String charset = m.matches() ? m.group(1) : "ISO-8859-1";
Reader r = new InputStreamReader(con.getInputStream(), charset);
StringBuilder buf = new StringBuilder();
while (true) {
  int ch = r.read();
  if (ch < 0)
    break;
  buf.append((char) ch);
}
String str = buf.toString();

This is a lot of code, and you should be able to copy/paster it, and at the end of it str will contain the same html drawn in the webview. This answer is from Simplest way to correctly load html from web page into a string in Java and it should work on Android as well. I have not tested this and did not write it myself, but it might help you out.

Also, the URL this is pulling is hardcoded, so you'll have to change that.

查看更多
初与友歌
5楼-- · 2018-12-31 08:21

I would suggest instead of trying to extract the HTML from the WebView, you extract the HTML from the URL. By this, I mean using a third party library such as JSoup to traverse the HTML for you. The following code will get the HTML from a specific URL for you

public static String getHtml(String url) throws ClientProtocolException, IOException {
        HttpClient httpClient = new DefaultHttpClient();
        HttpContext localContext = new BasicHttpContext();
        HttpGet httpGet = new HttpGet(url);
        HttpResponse response = httpClient.execute(httpGet, localContext);
        String result = "";

        BufferedReader reader = new BufferedReader(
            new InputStreamReader(
                response.getEntity().getContent()
            )
        );

        String line = null;
        while ((line = reader.readLine()) != null){
            result += line + "\n";
        }
        return result;
    }
查看更多
倾城一夜雪
6楼-- · 2018-12-31 08:22

One touch point I found that needs to be put in place is "hidden" away in the Proguard configuration. While the HTML reader invokes through the javascript interface just fine when debugging the app, this works no longer as soon as the app was run through Proguard, unless the HTML reader function is declared in the Proguard config file, like so:

-keepclassmembers class <your.fully.qualified.HTML.reader.classname.here> {
    public *; 
}

Tested and confirmed on Android 2.3.6, 4.1.1 and 4.2.1.

查看更多
倾城一夜雪
7楼-- · 2018-12-31 08:23

I suggest to try out some Reflection approach, if you have time to spend on the debugger (sorry but I didn't have).

Starting from the loadUrl() method of the android.webkit.WebView class:

http://grepcode.com/file/repository.grepcode.com/java/ext/com.google.android/android/2.2_r1.1/android/webkit/WebView.java#WebView.loadUrl%28java.lang.String%2Cjava.util.Map%29

You should arrive on the android.webkit.BrowserFrame that call the nativeLoadUrl() native method:

http://grepcode.com/file/repository.grepcode.com/java/ext/com.google.android/android/2.2_r1.1/android/webkit/BrowserFrame.java#BrowserFrame.nativeLoadUrl%28java.lang.String%2Cjava.util.Map%29

The implementation of the native method should be here:

http://gitorious.org/0xdroid/external_webkit/blobs/a538f34148bb04aa6ccfbb89dfd5fd784a4208b1/WebKit/android/jni/WebCoreFrameBridge.cpp

Wish you good luck!

查看更多
登录 后发表回答