InternetOpenUrl only returns after entire HTTP res

2019-04-09 06:13发布

问题:

I am writing a download file utility using WinINET, and have noticed (especially on large downloads), that the WinINET InternetOpenUrl() call only returns after the entire HTTP response has been downloaded.

I confirmed this by using the Charles proxy tool, as well as using WireShark, and noticed that the download completes entirely and only then does WinINET notify my code.

Some simplified (synchronous) code:

hInt = InternetOpen(USER_AGENT_NAME, INTERNET_OPEN_TYPE_PRECONFIG, 
                    NULL, NULL, 0);
DWORD dwRequestFlags = INTERNET_FLAG_NO_UI   // no UI please
            |INTERNET_FLAG_NO_AUTH           // don't authenticate
            |INTERNET_FLAG_PRAGMA_NOCACHE    // do not try the cache or proxy
            |INTERNET_FLAG_NO_CACHE_WRITE;   // don't add this to the IE cache

hUrl = InternetOpenUrl(hInt, szURL, NULL, 0, dwRequestFlags, NULL);
if (hUrl)
{
  // <only gets here after entire download is complete>

  InternetCloseHandle(hUrl);
}
InternetCloseHandle(hInt);

The documentation suggests that this sends the request, and processes the headers of the response (not completes the download), and then you are expected to run through a InternetReadFile() loop until it returns TRUE and dwNumberOfBytesRead is 0.

From MSDN
InternetOpenUrl Function: The InternetOpenUrl function parses the URL string, establishes a connection to the server, and prepares to download the data identified by the URL. The application can then use InternetReadFile [...] to retrieve the URL data.

InternetReadFile Function: To ensure all data is retrieved, an application must continue to call the InternetReadFile function until the function returns TRUE and the lpdwNumberOfBytesRead parameter equals zero.

I've tried this using the asynchronous method too, and noticed the same thing. Specifically, the INTERNET_STATUS_RESPONSE_RECEIVED is only sent to the registered callback method after download is complete. Which means my client is only able to start accessing the data after the download has completed.

In a similar vein, I implemented a version that uses the WinHttp library too, and noticed exactly the same results.

This makes things tricky when it comes to timeouts. If the download exceeds the timeout (default of 30 seconds by the looks of it), InternetOpenUrl() fails.

So I have two questions:

If this is the expected behavior of the WinInet and WinHttp libraries, why does the documentation suggest looping through the InternetReadFile() call, why not just read the entire buffer (after all WinINET already has) ?

I understand providing the capability since you don't always want to allocate 150MB chunks of memory, but the excuse provided is that you don't know how much data is available... but WinINET has already completed the download.

And why make it look remarkably like the recv() method wrapped up if its just an abstraction over a temporary file, or file in the IE cache (or worse, a wasted block of memory)?

And what should I be setting the timeout length to? If I never know how big the data is before its timed-out, then how do I decide what to set the timeout value to?

Is this the expected behavior, and if so is there a way to get to the data as it is streaming down?

On a slow connection or with a large file, it is very conceivable that a lot of work can be done on the data before the entire download is completed. In a classic Berkley socket re-implementation of HTTP, looping through the recv() call would provide me with the data as it comes down, which is ultimately what I need.

Yes I could re-write an implementation using simple sockets, but I would rather not have to waste time on supporting the entire HTTP spec and SSL encryption, not to mention the proxy support in WinINET.

回答1:

I know its probably not polite to answer your own question, but I believe I tracked down what the problem was.

After a reboot (and many, many, many minutes wasted on Automatic Updates) I tried again, and experienced the same problem, but I took solice from Alex K. and J.J.'s comments suggesting this is not the expected behavior, and started investigating software running on the machine that might interfere.

After many applications were terminated, and many services were turned off, I stumbled across one service that I really hoped wouldn't have this kind of effect, however it did.

I turned off "Kaspersky Lab Network Agent", and hey-presto, InternetOpenUrl returned about 2 seconds after download of the HTTP response started. I would have preferred immediately, but a second or two of a 75 second download at least gives WinINET time to process headers and do whatever pre-processing it might need to.

It also turned out that if I don't read the data from InternetReadFile(), the download never completes (as seen via Charles), implying (hopefully) that InternetReadFile() is a wrapper around the recv() call indeed (as I would have expected).

Successive re-enabling and disabling of the Network Agent Service validated this finding. I would like to somehow conclusively prove (or disprove) this.

So it turns out, my (read: IT Security Department's) choice of anti virus and its intercept-all-network-layer-communications protection appears to have been the cause of the problem.