How to reliably reproduce curl_multi timeout while

2020-04-02 08:13发布

问题:

Relevant information: issue 3602 on GitHub

I'm working on a project that gathers and tests public/free proxies, and noticed that when I use the curl_multi interface for testing these proxies, sometimes I get many 28(timeout) errors. This never happens if I test every proxy alone.

The problem is that this issue is unreliably reproducible, and it does not always show up , it could be something in curl or something else.

Unfortunately, I'm not such a deep networks debugger and I don't know how to debug this issue on a deeper level, however I wrote 2 C testing programs (one of them is originally written by Daniel Stenberg but I modified it's output to the same format as the other C program). These 2 C programs test 407 public proxies using curl

  1. with curl_multi interface (which has the problem)

  2. with curl on many threads, each curl operates on a thread. (which has no problem)

These are the 2 C programs I wrote for testing I'm not a C developer so please let me know about anything wrong you notice in the 2 programs.

This is the original PHP class that I used for reproducing the issue a month ago.

And these are the 2 C programs tests results. You can notice that the tests done with curl_multi timeout, while the timeouts made by curl-threads are stable (about 50 out of 407 of the proxies are working).

This is a sample from the test results. Please note columns 4 and 5 to see how the curl threads timeout about ~170 times and successfully connect ~40 times. Out of these, curl_multi makes 0 successful connections and timeouts ~300 times out of 407 proxies.

column(1) : #
column(2) : time(UTC)
column(3) : total execution time (seconds)
column(4) : no error 0 (how many requests result in no error CURLE_OK)
column(5) : error 28 (how many requests result in error 28 CURLE_OPERATION_TIMEDOUT)
column(6) : error 7 (how many requests result in error 7 CURLE_COULDNT_CONNECT)
column(7) : error 35 (how many requests result in error 35 CURLE_SSL_CONNECT_ERROR)
column(8) : error 56 (how many requests result in error 56 CURLE_RECV_ERROR)
column(9) : other errors (how many requests result in errors other than the above)
column(10) : program that used the curl
column(11) : cURL version

c(1)    c(2)           c(3)c(4)c(5)c(6)c(7)c(8)c(9) c(10)                  c(11)
267 2019-3-28 01:58:01  40  43  176 183 1   4   0   C (curl - threads) (Linux Fedora)   7.59.0
268 2019-3-28 01:59:01  30  0   286 110 1   10  0   C (curl-multi one thread) (Linux Fedora)    7.59.0
269 2019-3-28 02:00:01  30  46  169 181 1   8   2   C (curl - threads) (Linux Fedora)   7.59.0
270 2019-3-28 02:01:01  31  0   331 74  1   1   0   C (curl-multi one thread) (Linux Fedora)    7.59.0
271 2019-3-28 02:02:01  30  42  173 186 1   4   1   C (curl - threads) (Linux Fedora)   7.59.0
272 2019-3-28 02:03:01  30  0   277 116 1   13  0   C (curl-multi one thread) (Linux Fedora)    7.59.0

Why does curl_multi timeout inconsistently with most of the connections, while curl-threads never does this?

I downloaded Wireshark and used it to capture the traffic while each of the 2 C programs was running, I also filtered the traffic to the proxies list used by the 2 C programs, and saved the files on GitHub.

the curl-threads program (the expected behavior)

63 successful connections and 158 connections timeout out of 407 proxies.

  • this is the program output.
  • this is the Wireshark .pcapng raw file.

the curl_multi program (the unexpected behavior)

0 successful connections and 272 connections timeout out of 407 proxies.

  • this is the program output.
  • this is the Wireshark .pcapng raw file.

You can open the .pcapng files using Wireshark and see the recorded traffic on my computer while both expected/unexpected behavior. I filtered the traffic to the 407 proxy IPs and left Wireshark open for a little while after the 30 seconds of curl limit because I noticed some packets still showing up. I don't know Wireshark and this level of networking, but I thought this could be useful.


Note on the bandwidth:

Open the .pcapng file of the curl_threads program (the normal behavior) in wireshark and go to Statistics > Conversations . you will see a window like this

I have copied the data and saved them here on GitHuB , now calculate the Sum of the Bytes sent from A->B and B->A.

The ENTIRE bandwidth needed to work normally is about 692.8 KB.

回答1:

I've gotten reproducible behavior and I'm waiting for badger on GitHub to reply. Try running a program like Ettercap to get more information.



回答2:

To me it looks that you are not having problem with the curl itself but doing too much connections concurrently to the proxy servers if the connections are refused. You might be blacklisted permanently or for some period.

Check that by running your curl from current IP and do stat: how many connections were established, how many refused, how many timed out. Do it several times and collect an average. Change then server to other that has different IP and check what stats you have there. At the first run you should have much better statistics, that probably if you repeat test at new IP will get only worse. Good idea might be to not use all pool of the proxies to connect to do stat but select a slice from them and check on actual IP and repeat that check on new IP so if the reason is you abusing service you don't blacklist yourself at all proxies but still be having next group of 'untouched' proxies to test on them on new IP if this is really the case. Be aware that even if the IPs of proxies are at different location they can belong to the same service provider. That probably has one abuse list for all of their proxy serves so if you are not seen well with the amount of requests you do in one country you can be blocked in other country as well, even before you connect to the other country proxy.

If you still want to check if this is not curl then you can set up a test environment with multiple serves. This test environment you can pass to curl maintainer so he can replicate the error. You can use docker and create 10, 20 or 100 proxy servers and connect to them to see if curl has a problem or not.

you will need docker it can be installed on Win/Mac/Linux
one of the proxy image to create proxies
create network tutorial for the containers (bridge should be ok)
attach containers to network --network
good to set for each proxy container their --ip
make for each proxy container possible to read config and write error log (so you can read why they disconnected if that happens) by mountig error log/config files/direcotires with --volume
and all proxy containers should be runnig

you can connect to a proxy that is running inside container two ways. if you would like to have curl outside these containers then you need to expose with -p these proxies' ports from container to the outside world (curl in your case).

or

you may use another container image that has linux + curl. For example Alpine linux + curl and connect it the same network the same way as you do with proxies. If you do that you don't need to publish (expose) ports of proxies and don't need to think about what number of proxy port should I expose for this particular proxy.

at each step you can issue a command

docker ps -a

to see all containers and their status.

to stop and remove all containers (not the images they are coming from but running containers) in case you had some erros with container that exited.

docker stop $(docker ps -aq) && docker rm $(docker ps -aq)

or to stop and remove from the list a particular container

docker stop <container-id>
docker rm <container-id>

to see all containers that are connected to bridge network (default)

docker network inspect bridge

If you confirm there is problem really with connection to proxies that are at your local machine then this is something maintainer of curl can replicate.

just put all commands like above to create all proxies connect them to network etc in a file for example replicate.sh script starting with

#!/bin/sh

and your comands here

save that file and issue then command

chmod +x ./replicate.sh

to make it executable.

you can run it to double check if everything is working as expected

./replicate.sh

and send the maintainer of curl to replicate environment on which you had experienced problem.

If you don't like to put a lot of commands like doker run for the proxies to run, you can use docker compose instead that allows you to define whole testing environment in one file.

If you run lot of containers you can limit resources for example memory each of them consume, may help you in case of so many proxies