Proxying requests in Node

2019-06-16 09:28发布

I need to be able to offer replica sites (to www.google.com, www.facebook.com, etc. any site) through my node server. I found this library:

https://github.com/nodejitsu/node-http-proxy

And I used the following code when proxying requests:

options = {
  ignorePath: true,
  changeOrigin: false
}

var proxy = httpProxy.createProxyServer({options});

router.get(function(req, res) {
  proxy.web(req, res, { target: req.body.url });
});

However, this configuration causes an error for most sites. Depending on the site, I'll get an Unknown service error coming from the target url, or an Invalid host... something along those lines. However, when I pass

changeOrigin: true

I get a functioning proxy service, but my the user's browser gets redirected to the actual url of their request, not to mine (so if req.body.url = http://www.google.com, the request will go to http://www.google.com)

How can I make it so my site's url gets shown, but so that I can exactly copy whatever is being displayed? I need to be able to add a few JS files to the request, which I'm doing using another library.

For clarification, here is a summary of the problem:

  1. The user requests a resource that has a url property

  2. This url is in the form of http://www.example.com

  3. My server, running on www.pv.com, need to be able to direct the user to www.pv.com/http://www.example.com

  4. The HTTP response returned alongside www.pv.com/http://www.example.com is a full representation of http://www.example.com. I need to be able to add my own Javascript/HTML files in this response as well.

标签: node.js proxy
3条回答
Melony?
2楼-- · 2019-06-16 09:51

Use a headless browser to navigate to the website and get the HTML of the website. Then send the HTML as a response for the website requested. One advantage of using a headless browser is that it allows you to get the HTML from sites rendered with JavaScript. Nightmare.js (an API or library for electron.js) is a good choice because it uses Electron.js under the hood. The electron framework is faster than Phantom.js (an alternative). With Nightmare.js you can inject a JavaScript file into the page as shown in the code snippet below. You may need to tweak the code to add other features. Currently, I am only allowed to add two links, so links to other resources are in the code snippet.


apt-get update && apt-get install -y xvfb x11-xkb-utils xfonts-100dpi
xfonts-75dpi xfonts-scalable xfonts-cyrillic x11-apps clang
libdbus-1-dev libgtk2.0-dev libnotify-dev libgnome-keyring-dev
libgconf2-dev libasound2-dev libcap-dev libcups2-dev libxtst-dev
libxss1 libnss3-dev gcc-multilib g++-multilib

-

// example: http://hostname.com/http://www.tutorialspoint.com/articles/how-to-configure-and-install-redis-on-ubuntu-linux
//X server: http://www.linfo.org/x_server.html

var express = require('express')
var Nightmare = require('nightmare')// headless browser
var Xvfb = require('xvfb')// run headless browser using X server
var vo = require('vo')// run generator function
var app = express()
var xvfb = new Xvfb()


app.get('/', function (req, res) {
  res.end('')
})

// start the X server to run nightmare.js headless browser
xvfb.start(function (err, xvfbProcess) {
  if (!err) {
    app.get('/*', function (req, res) {
      var run = function * () {
        var nightmare = new Nightmare({
          show: false,
          maxAuthRetries: 10,
          waitTimeout: 100000,
          electronPath: require('electron'),
          ignoreSslErrors: 'true',
          sslProtocol: 'tlsv1'
        })

        var result = yield nightmare.goto(req.url.toString().substring(1))
        .wait()
        // .inject('js', '/path/to/.js') inject a javascript file to manipulate or inject html
        .evaluate(function () {
          return document.documentElement.outerHTML
        })
        .end()
        return result
      }

      // execute generator function
      vo(run)(function (err, result) {
        if (!err) {
          res.end(result)
        } else {
          console.log(err)
          res.status(500).end()
        }
      })
    })
  }
})

app.listen(8080, '0.0.0.0')
查看更多
趁早两清
3楼-- · 2019-06-16 10:00

You need to have HTTPS, as most of the websites you mentioned will redirect to their HTTPS version of their website. Perhaps, instead of doing http proxy you are better of with SOCKS proxy if you want to provide access to some websites from places where these are forbidden/blocked.

查看更多
SAY GOODBYE
4楼-- · 2019-06-16 10:08

Looking at https://stackoverflow.com/a/32704647/1587329, the only difference is that it uses a different target parameter:

var http = require('http');
var httpProxy = require('http-proxy');
var proxy = httpProxy.createProxyServer({});

http.createServer(function(req, res) {
    proxy.web(req, res, { target: 'http://www.google.com' });
}).listen(3000);

This would explain the Invalid host error: you need to pass a host as the target parameter, not the whole URL. Thus, the following might work:

options = {
  ignorePath: true,
  changeOrigin: false
}

var proxy = httpProxy.createProxyServer({options});

router.get(function(req, res) {
  var url = req.body.url;
  proxy.web(req, res, { target: url.protocol + '//' + url.host });
});

For the URL object, see the NodeJS website.

查看更多
登录 后发表回答