How to deploy this “Python+twill+mechanize” combin

2020-08-01 06:39发布

问题:

I've been trying to pass my login and password from Python script to the eBay sign-in page. Later I want this script to be run from "Google App Engine"

I was suggested to use "mechanize". Unfortunately, it didn't work for me:


IDLE 1.2.4      
>>> import re
>>> import mechanize
>>> br = mechanize.Browser()
>>> br.open("https://signin.ebay.com")

Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    br.open("https://signin.ebay.com")
  File "build\bdist.win32\egg\mechanize\_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "build\bdist.win32\egg\mechanize\_mechanize.py", line 255, in _mech_open
    raise response
httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt
>>> 

Earlier I was trying to use Python and twill - it didn't work either until one supporter suggested that I download the latest version of mechanize and then perform the following steps:

  1. Locate the following folder on my computer: "C:\Python25\Lib\site-packages\twill\other_packages\_mechanize_dist"

  2. Change its name to "_mechanize_dist_backup" (the full path, thus, should be "C:\Python25\Lib\site-packages\twill\other_packages\_mechanize_dist_backup")

  3. Copy the "mechanize" folder (which is located in "mechanize-0.2.2" - the folder that I had downloaded and unzipped from the "mechanize" official site) and paste it in "C:\Python25\Lib\site-packages\twill\other_packages" (the full path, thus, being "C:\Python25\Lib\site-packages\twill\other_packages\mechanize")

  4. Change its name to "_mechanize_dist" (the full path being "C:\Python25\Lib\site-packages\twill\other_packages_mechanize_dist")

  5. Copy "ClientForm" file from "_mechanize_dist_backup" and paste it in "_mechanize_dist" (in fact, I found there two files named "ClientForm": one is a python file, another one is a compiled python file - I copied and pasted both of them).

After I had performed all these steps, I tried to log in to my eBay account from the twill shell in Python and it worked!!! I could even log in to my Yahoo mail box in the same way and check my mails!

But now I have a dilemma: I don't know how I could deploy my script to "Google App Engine".

Earlier I had been advised that if I want to use third-party libraries in App Engine projects, I simply have to include them with my application when I deploy it - in case with twill, for example, I just need to copy the twill folder into my application's folder and deploy it.

But now not only do I have this twill folder as a third-party library to be included, but also all these changes performed in "C:\Python25" (in "C:\Python25\Lib\site-packages\twill\other_packages", to be precise) while my application folder - the one in which I have my script ("my_script.py" file) - is located on "E" disk.

Can anybody, please, give me some suggestions here?

回答1:

As for GAE deployment issue, @brilliant, looks like the code you're dealing is all pure python 2.5 (the only really blocking issue would be if it isn't -- no binary extensions allowed, no code requiring Python 2.6 or better allowed, and that's just the way it is on GAE at this time).

So, under this assumption, the only issue w/deploying the code on App Engine is having all the code, NOT in site-packages (from which of course GAE's dev_appserver.py deploys absolutely nothing, nada, zilch), but rather in your GAE project's directory (I suggest a recursive zip of all the .py files, only -- remove all the .pyc files, in particular, before you zip -r it;-).

All in all, it's just a question of a couple of appropriate shell commands: cp -R then zip -r (probably harder on non-unixy shells, but, hey, even on Windows you can do it with bash from cygwin... in any case, it's hardly a "development" issue, per se;-).



回答2:

The error message is indicating that mechanize is obeying the site's robots.txt file for you.

You should use eBay's API if you want to access their site in an automated way. If you don't, and build your own solution that ignores robots.txt, don't be surprised when they block you, and complain to Google about automated queries coming from App Engine from your app.