Interesting “getElementById() takes exactly 1 argu

2019-02-08 04:27发布

问题:

#-*- coding:utf-8 -*-
import win32com.client, pythoncom
import time

ie = win32com.client.DispatchEx('InternetExplorer.Application.1')
ie.Visible = 1
ie.Navigate('http://ieeexplore.ieee.org/xpl/periodicals.jsp')
time.sleep( 5 )

ie.Document.getElementById("browse_keyword").value ="Computer"
ie.Document.getElementsByTagName("input")[24].click()

import win32com.client, pythoncom
import time

ie = win32com.client.DispatchEx('InternetExplorer.Application')
ie.Visible = 1
ie.Navigate('www.baidu.com')
time.sleep(5)

print 'browse_keword'
ie.Document.getElementById("kw").value ="Computer"
ie.Document.getElementById("su").click()
print 'Done!'

When run the first section code,it will popup:

ie.Document.getElementById("browse_keyword").value ="Computer"
TypeError: getElementById() takes exactly 1 argument (2 given)

And the second section code runs ok. What is the difference that making the result different?

回答1:

The difference between the two cases has nothing to do with the COM name you specify: either InternetExplorer.Application or InternetExplorer.Application.1 result in the exact same CLSID which gives you an IWebBrowser2 interface. The difference in runtime behaviour is purely down to the URL you retrieved.

The difference here may be that the page which works is HTML whereas the other one is XHTML; or it may simply be that errors in the failing page prevent the DOM initialising properly. Whichever it appears to be a 'feature' of the IE9 parser.

Note that this doesn't happen if you enable compatibility mode (after the second line below I clicked the compatibility mode icon in the address bar):

(Pdb) ie.Document.DocumentMode
9.0
(Pdb) ie.Document.getElementById("browse_keyword").value
*** TypeError: getElementById() takes exactly 1 argument (2 given)
(Pdb) ie.Document.documentMode
7.0
(Pdb) ie.Document.getElementById("browse_keyword").value
u''

Unfortunately I don't know how to toggle compatibility mode from a script (the documentMode property is not settable). Maybe someone else does?

The wrong argument count is, I think, coming from COM: Python passes in the arguments and the COM object rejects the call with a misleading error.



回答2:

As a method of a COMObject, getElementById is built by win32com dynamically.
On my computer, if url is http://ieeexplore.ieee.org/xpl/periodicals.jsp, it will be almost equivalent to

def getElementById(self):
    return self._ApplyTypes_(3000795, 1, (12, 0), (), 'getElementById', None,)

If the url is www.baidu.com, it will be almost equivalent to

def getElementById(self, v=pythoncom.Missing):
    ret = self._oleobj_.InvokeTypes(1088, LCID, 1, (9, 0), ((8, 1),),v
            )
    if ret is not None:
        ret = Dispatch(ret, 'getElementById', {3050F1FF-98B5-11CF-BB82-00AA00BDCE0B})
    return ret

Obviously, if you pass an argument to the first code, you'll receive a TypeError. But if you try to use it directly, namely, invoke ie.Document.getElementById(), you won't receive a TypeError, but a com_error.

Why win32com built the wrong code?
Let us look at ie and ie.Document. They are both COMObjects, more precisely, win32com.client.CDispatch instances. CDispatch is just a wrapper class. The core is attribute _oleobj_, whose type is PyIDispatch.

>>> ie, ie.Document
(<COMObject InternetExplorer.Application>, <COMObject <unknown>>)
>>> ie.__class__, ie.Document.__class__
(<class win32com.client.CDispatch at 0x02CD00A0>,
 <class win32com.client.CDispatch at 0x02CD00A0>)
>>> oleobj = ie.Document._oleobj_
>>> oleobj
<PyIDispatch at 0x02B37800 with obj at 0x003287D4>

To build getElementById, win32com needs to get the type information for getElementById method from _oleobj_. Roughly, win32com uses the following procedure

typeinfo = oleobj.GetTypeInfo()
typecomp = typeinfo.GetTypeComp()
x, funcdesc = typecomp.Bind('getElementById', pythoncom.INVOKE_FUNC)
......

funcdesc contains almost all import information, e.g. the number and types of the parameters.
If url is http://ieeexplore.ieee.org/xpl/periodicals.jsp, funcdesc.args is (), while the correc funcdesc.args should be ((8, 1, None),).

Long story in short, win32com had retrieved the wrong type information, thus it built the wrong method.
I am not sure who is to blame, PyWin32 or IE. But base on my observation, I found nothing wrong in PyWin32's code. On the other hand, the following script runs perfectly in Windows Script Host.

var ie = new ActiveXObject("InternetExplorer.Application");
ie.Visible = 1;
ie.Navigate("http://ieeexplore.ieee.org/xpl/periodicals.jsp");
WScript.sleep(5000);
ie.Document.getElementById("browse_keyword").value = "Computer";

Duncan has already pointed out IE's compatibility mode can prevent the problem. Unfortunately, it seems it's impossible to enable compatibility mode from a script.
But I found a trick, which can help us bypass the problem.

First, you need to visit a good site, which gives us a HTML page, and retrieve a correct Document object from it.

ie = win32com.client.DispatchEx('InternetExplorer.Application')
ie.Visible = 1
ie.Navigate('http://www.haskell.org/arrows')
time.sleep(5)
document = ie.Document

Then jump to the page which doesn't work

ie.Navigate('http://ieeexplore.ieee.org/xpl/periodicals.jsp')
time.sleep(5)

Now you can access the DOM of the second page via the old Document object.

document.getElementById('browse_keyword').value = "Computer"

If you use the new Document object, you will get a TypeError again.

>>> ie.Document.getElementById('browse_keyword')
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
TypeError: getElementById() takes exactly 1 argument (2 given)


回答3:

Calls to methods of instances in Python automatically adds the instance as first argument - that's why you have to explicitly write the 'self' argument inside methods.

For example, instance.method(args...) is equal to Class.method(instance, args...).

From what I see the programmer must have forgotten to write the self keyword, resulting in breaking the method. Try to look inside the library code.



回答4:

I just got this issue when I upgraded to IE11 from IE8.

I've only tested this on the getElementsByTagName function. You have to call the function from the Body element.

#-*- coding:utf-8 -*-
import win32com.client, pythoncom
import time

ie = win32com.client.DispatchEx('InternetExplorer.Application.1')
ie.Visible = 1
ie.Navigate('http://ieeexplore.ieee.org/xpl/periodicals.jsp')
time.sleep( 5 )

ie.Document.Body.getElementById("browse_keyword").value ="Computer"
ie.Document.Body.getElementsByTagName("input")[24].click()