Using selenium to save images from page

2019-03-29 21:20发布

I'm using Selenium & Google Chrome Driver to open pages programatically. On each page there is a dynamically generated image which I'd like to download. At the moment, I'm waiting for the page to finish loading, then I grab the image URL and download it using System.Net.WebClient.

That works fine except I'm downloading the images twice - once in the browser, once with WebClient. The problem is that each image is roughly 15MB and downloading twice adds up quickly.

So - is it possible to grab the image straight from Google Chrome?

6条回答
贼婆χ
2楼-- · 2019-03-29 21:49

All the above answers work. However, they all have limitations. mecek's method is cool, but it only works on browsers that support html 5 (although most browsers now do), and it will downgrade the image quality. The screenshot method will also downgrade image quality. Using System.Net.WebClient can avoid this issue, but won't work in the case of downloading a captcha image. Actually the only way that works for me when downloading a captcha image is using the Actions class (or Robot if you are using Selenium's java version), something like below:

using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Interactions;
using System.Windows.Automation;//you need to add UIAutomationTypes and UIAutomationClient to references
using System.Runtime.InteropServices;

[DllImport("User32.dll")]
static extern int SetForegroundWindow(IntPtr point);

private IntPtr getIntPtrHandle(IWebDriver driver, int timeoutSeconds = 30)
{
        var end = DateTime.Now.AddSeconds(timeoutSeconds);
        while (DateTime.Now < end)
        {
            var ele = AutomationElement.RootElement;
            foreach (AutomationElement child in ele.FindAll(TreeScope.Children, Condition.TrueCondition))
            {
                if (!child.Current.Name.Contains(driver.Title)) continue;
                return new IntPtr(child.Current.NativeWindowHandle);
            }
        }
        return IntPtr.Zero;
}

private void downloadCaptcha(IWebDriver chromeDriver)
{
    OpenQA.Selenium.IWebElement captchaImage = chromeDriver.FindElement(By.Id("secimg0"));
    var handle = getIntPtrHandle(chromeDriver);
    SetForegroundWindow(handle);//you need a p/invoke 
    Thread.Sleep(1500);//setting foreground window takes time
    Actions action = new Actions(chromeDriver);
    action.ContextClick(captchaImage).Build().Perform();
    Thread.Sleep(300);
    SendKeys.Send("V");
    var start = Environment.TickCount;
    while (Environment.TickCount - start < 2000)
    {//can't use Thread.Sleep here, alternatively you can use a Timer
          Application.DoEvents();
    }
    SendKeys.SendWait(@"C:\temp\vImage.jpg");
    SendKeys.SendWait("{ENTER}");
}

This is the only way I've found to download a captcha image without losing its quality (for better OCR effects) using Selenium Chrome driver, although the limitation is also obvious.

查看更多
看我几分像从前
3楼-- · 2019-03-29 21:54

One way is to get base64 string of the image with javascript that is executed by webdriver. Then you can save base64string of the image to file.

Basically, if your image is

<img id='Img1' src='someurl'>

then you can convert it like

var base64string = driver.ExecuteScript(@"
    var c = document.createElement('canvas');
    var ctx = c.getContext('2d');
    var img = document.getElementById('Img1');
    c.height=img.height;
    c.width=img.width;
    ctx.drawImage(img, 0, 0,img.width, img.height);
    var base64String = c.toDataURL();
    return base64String;
    ") as string;

var base64 = base64string.Split(',').Last();
using (var stream = new MemoryStream(Convert.FromBase64String(base64)))
{
    using (var bitmap = new Bitmap(stream))
    {
        var filepath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "ImageName.png");
        bitmap.Save(filepath, ImageFormat.Png);
    }
}
查看更多
我只想做你的唯一
4楼-- · 2019-03-29 21:55

You can block images from being downloaded in Google Chrome using this technique. It runs a Google Chrome extension called "Block Image". This way the image won't be downloaded using chrome, and it's just a matter of downloading the image as normal using its URL & System.Net.WebClient.

查看更多
萌系小妹纸
5楼-- · 2019-03-29 22:03

Yes, you do this in several steps:

  1. Take a screenshot of the webpage and save it to disk
  2. Find the image element
  3. Find the image element location, width and height
  4. Crop the image you need from the screenshot you took in step 1
  5. Save the image to disk (or do something else with it)

Sample code - please add your code to catch exceptions

        IWebDriver driver = new ChromeDriver();

        //replace with the page you want to navigate to
        string your_page = "https://www.google.com"; 
        driver.Navigate().GoToUrl(your_page);

        ITakesScreenshot ssdriver = driver as ITakesScreenshot;
        Screenshot screenshot = ssdriver.GetScreenshot();

        Screenshot tempImage = screenshot;

        tempImage.SaveAsFile(@"C:\full.png", ImageFormat.Png);

        //replace with the XPath of the image element
        IWebElement my_image = driver.FindElement(By.XPath("//*[@id=\"hplogo\"]/canvas[1]"));

        Point point = my_image.Location;
        int width = my_image.Size.Width;
        int height = my_image.Size.Height;

        Rectangle section = new Rectangle(point, new Size(width, height));
        Bitmap source = new Bitmap(@"C:\full.png");
        Bitmap final_image = CropImage(source, section);

        final_image.Save(@"C:\image.jpg");

the CropImage method was posted by James Hill, How to cut a part of image in C#

but I will add it here as well for clarity

    public Bitmap CropImage(Bitmap source, Rectangle section)
    {
        Bitmap bmp = new Bitmap(section.Width, section.Height);
        Graphics g = Graphics.FromImage(bmp);
        g.DrawImage(source, 0, 0, section, GraphicsUnit.Pixel);
        return bmp;
    }
查看更多
劫难
6楼-- · 2019-03-29 22:09
I'm using Selenium & Google Chrome Driver

Talks of selenium.

once in the browser, once with WebClient

Htmlunit ?

Anyways, why don't you use webclient (htmlunit-driver) or pure htmlunit (http://htmlunit.sourceforge.net/). Htmlunit doesn't download images by default.

You can download them on will, as per your requirement.

查看更多
▲ chillily
7楼-- · 2019-03-29 22:09

Have you trying to download the image using ImageIO?

String imageUrl = "image.png";
BufferedImage bufferedImage = ImageIO.read(imageUrl);
ImageIO.write(bufferedImage, "png", new File("savedImage.png"));
查看更多
登录 后发表回答