Execution flow in selenium browser automation

2019-04-01 17:32发布

I'm uncertain about the script(automated test) execution in selenium. I suppose the process is as below:

  • execution starts.
  • A selenese command is transformed into an HTTP request.
  • HTTP server of browser driver receives the HTTP request.
  • Browser driver determines the steps needed for implementing the
    command.
  • Browser driver executes them on the browser.
  • The execution status is sent back to the HTTP server of the browser driver and then to the script(IDE).

I suppose this is the process. Please correct me wherever I'm wrong.

1条回答
甜甜的少女心
2楼-- · 2019-04-01 18:21

Yes, this is it, in broad strokes.

The Theory

webdriver calls flow

In bold & in boxes are the acting parties, in italic & arrows the used protocols.

When you want to interact with a browser,

  1. your code uses a webdriver client (usually a library, like selenium) in the language you use (Java, Python, Ruby, etc).
  2. That client communicates with a webdriver server, sending & receiving the data following the webdriver protocol; this protocol is encapsulated in http for easier transport & control.
  3. The webdriver server translates it to actual commands to the browser - so it (the browser) interacts with the page, or gets data from it.

The flow is always end-to-end (e.g. the browser never communcates directly with your code :)), and bi-directional. A failure/exception usually goes only upstream to your code.


Some Details

The "browser's webdriver" in that graph is a binary (a program) - the "geckodriver" for Firefox (with ".exe" on Windows), "chromedriver", "safaridriver", "edgedriver.exe" (it always is with ".exe" :)). It acts as a proxy - on one side accepting and understanding the commands in the webdriver protocol, on the other - knowing how to communicate with the browser.

The webdriver is always an HTTP server - all commands are encapsulated in HTTP, with the usual methods get/post/delete/put (close, if not the same as a regular REST). It implements the webdriver protocol, so clients (selenium & co) have a well defined API to communicate with it. Thus it can also be referred as the "webdriver server" - it listens for commands, proxies them to the browser, and returns responses to the client. (no one calls it like this :), but it's making it easier to distinguish between "webdriver the executable" and "webdriver the protocol")

Being a server, it binds & listens on a random network port - on your local machine, or on a remote one. If you are running locally, this is the reason its binary must be in your path variable - upon initialization Selenium starts it (so it must be able to find it) and gets the network port it is listening on (for further communication). If you're using a remote connection, you must either a) know the IP:port of the remote webdriver server, or b) use a "Selenium Hub", which tracks this information under its domain, and shares it with you.


The communication b/n the webdriver server and the browser is usually binary rpc, and very much browser-specific - it uses internal APIs, the webdriver knows the guts and bolts how to control this particular browser best. Thus the drivers are provided by the browser vendors. This is always local (in the same machine/OS) communication (at least to my knowledge).


If you are using a higher-level framework like Robot Framework, Cucumber, JBehave etc, it sits before "your code" in that diagram, trying to shield you from some of the selenium calls.


In Practice

"A picture is worth a thousand words", so a code must be something like 740? :) Enough theory, here's a practical example:

from selenium import webdriver       # importing selenium bindings

wd = webdriver.Firefox()      # connect to the "webdriver server", a local one
element = wd.find_element_by_css_selector('#my-id')  # locate an element
the_text = element.text        # get is text

assert(text == 'My awesome text!')   # verify it's the expected one

This whole listing is your code in the first part - the different instructions, flow control and checks that are executed to get the job done. On line 1 the python's selenium library is imported for further usage.

Selenium is the most popular framework implementing the webdriver protocol; it has implementations (i.e.bindings) for different languages - python here, java, ruby, javascript and so on. What it strives to do is to have an uniform interface for all of them - getText() in Java is also available in Python as .text, and again - so on. With this interface it isolates the client from the actual webdriver protocol - the user types .text, and doesn't care how this is actually executed, nor has to change his code if the protocol changes.


On line 3 a webdriver object is instantiated; as this one here is a local server, the instantiation process goes through the local steps described earlier - the "webdriver server" is ran, its port is now known (and stored in the object) and communication can start.


Line 4 in the code uses the selenium method to locate a particular element in the page. Under the hood, the library sends a POST http request to the webdriver server, to locate the element.
Why POST? Because once successfully found, the server assigns an internal id to it, which will be used from then on; and returns the id to the client, which stores it as a property of the element object (* see the footnote).
How does the webdriver server locate that element? No how - it communcates with the browser, through the propriatory protocol, saying "Hey, using your rendering and evaluation engine, find an element in the DOM that matches this CSS selector, and give me a reference we both can reuse in the future." (i.e. " the magic" :). So it is the browser that does the work, the webdriver server just proxies the communication.


Let's get to the specifics - line 5 executes the command .text, that obviously returns the text of the element (if you don't know python, don't be alarmed how come it's a command but doesn't have () at the end - that's a language quirk, aliasing methods as object properties, a quite handy feature).
What happens at this point: selenium python binding matches this command to the getElementText in its common interface; then it matches that to a webdriver protocol command (open the link, it's interesting, I promise) - it's of a GET type, and the parameters for it are this and that.
It opens a network connection to "localhost:the_know_port", to this endpoint:

GET /session/2cce72b7-c748-48bc-b350-6dd6730b5a69/element/5/text

The first "random" string is the session id - a webdriver server can be used by many clients, yours is established and stored at line 3. The second parameter (the "5") is the element's id, established in line 4. Then comes "text" - the subresource you are requesting, one of the element's supported ones.
And this is the infamous webdriver protocol/API - the knowledge of specific access schemes (you can get the "text" of an established element, in a session) and flow (you must first establish a shared session, then a reference to an element, so finally to get "text").

After that the webdriver server makes the browser get the info from its DOM ("the magic"), and sends it back to the client (the selenium instance) on the wire:

{"sessionId":"2cce72b7-c748-48bc-b350-6dd6730b5a69","status":0,"value":"My awesome text!"}

Your selenium instance was waiting for the response, gets and parses the info from the payload, and returns the value to your code - the variable the_text now has the value "My awesome text!".


And - done, the cycle code -> webdriver client -> webdriver server -> browser -> webdriver server -> webdriver client -> code is now complete.


Footnotes:

(*) - this is the actual reason for the dreaded StaleElementReferenceException: all three - the client, the webdriver server, and the browser, hold a refence to an element in the DOM.
But at a particular moment in time, a 3rd party - a javascript code running in the browser, changes/removes the element, blissfully unaware something has a reference it now invalidates (come to think of it, quite an evil act :D).
The next time the client tries to interact with the reference, through the webdriver server, in the browser - the element is no longer there. Naturally, the interaction fails, the failure goes back upstream to the client and surfaces with the exception; its text message is "Element is no longer attached to the DOM" - which being a bit cryptic makes perfect sense now, hopefully.

查看更多
登录 后发表回答