Automate data retrieval from a web site using a Ru

2019-06-03 08:51发布

问题:

Say I have a website which displays your marks when you input your roll number. You can also see others' marks the same way by incrementing your own roll number.

I want to create an Excel sheet to find the standard deviation of the marks (college project).

It is physically impossible for me to manually enter all the data, so I am searching for some automation method which can do this work for me and save all fields in a text file, which I can easily convert to a table.

Background Details:

Link to the site here.

The input is in a text box which. When submit is clicked the table is generated from the server side and displays in the web page.

The code looks easy enough for a web bot to send request and collect the data from the generated page.

Problem:

I have no Idea how to write a web bot where to write a web bot. And I am ready to learn a programming language ground up.

I have started studying/coding Ruby and would reach level enough to do this in a week or so. But I still need help to find my way, over how to do so.

If you need to see the web link and the generated page, please feel free to use my roll number: 5675351

回答1:

First of all, you will need a ruby library that can issue a POST request. Such as Faraday . Then you will issue a POST request with hash of parameters(filling the form). In your case the name of parameter is "regno"(look at the html source of the page to figure it out yourself) and the value is well the number for which you want to extract data.

What you will have on this stage is the source of html page with results.

Results are all in roughly the same form:

<tr bgColor="#ffffff">
    <td align="middle"><font face="Arial" size=2> 301</font></td>
    <td align="left" ><font face="Arial" size=2>ENGLISH CORE</font></td>
    <td align="left" ><font face="Arial" size=2>084&nbsp;&nbsp;&nbsp;&nbsp;</font></td>
    <td align="middle"><font face="Arial" size=2>A2</font></td>
  </tr>

Only the bgColor of tr varies and the data of course. You need to extract all these blocks using a regular expression, for example. You can do one better and use XPath feature of Nokogiri, another ruby library. You need to look these two up by yourself.

When you have all the data, you don't need to create Excel sheet - Ruby is capable of doing such simple math by itself.

I recommend going through all examples of two mentioned libraries and applying all relevant ones to your specific task. Ruby is actually a great choice for such task, as libraries are mostly good and starting is painless. Having no programming experience though will complicate things along the way.