rvest seems to be putting TCP FDs in CLOSE_WAIT state in some cases.
Replication example: On R terminal do:
This will return.
Now do NOT close the R prompt.
Now open normal command Terminal and do:
lsof | grep materialresourcing
And you will find that it has a TCP handle in CLOSE_WAIT state, forever. (i.e till the R session is closed)
And the rvest API doesn't seem to give control to force close connection.
Any thoughts? Thanks!
In fact just doing:
k<-curl_download("http://materialresourcing.com/tags/diy", "a.txt")
also causes it
UPDATE IN RESPONSE TO @jeroen questions
> library(curl)
> h <- new_handle()
> l2<-"http://rediff.com"
> curl_download(l2,"b.txt",handle=h)
> system("lsof | grep rediff")
> l3<-"http://materialresourcing.com/tags/diy"
> curl_download(l3,"c.txt",handle=h)
> system("lsof | grep materialresourcing")
R 17936 xxxx 7u IPv4 3559514 0t0 TCP ip-10-0-xx:40844->vps.materialresourcing.com:http (CLOSE_WAIT)
sh 18257 xxxx 7u IPv4 3559514 0t0 TCP ip-10-0-xx:40844->vps.materialresourcing.com:http (CLOSE_WAIT)
grep 18259 xxxx 7u IPv4 3559514 0t0 TCP ip-10-0-xx:40844->vps.materialresourcing.com:http (CLOSE_WAIT)
As you can see only for this link. CLOSE_WAIT shows. And same thing happens using rvest too. and gc() has no effect there.