I've got an AWS Lambda function running NodeJS code to stream files from S3 to ClamAV running on an EC2 instance.
Generally (about 75% of the time) the system works, but often (especially when multiple files are being scanned from different Lambda containers) clamd
threads gets stuck on INSTREAM
.
Once a thread has been in INSTREAM
for 25-30 seconds it does not seem to be able to recover. When it has been QUEUEDSINCE
350 seconds it is killed off. I can't figure out how either of these numbers relate to any value in my config.
I'm struggling to find any sign of an error in the logs - the number of INSTREAM requests matches the number of complete scans:
$ sudo grep -c "got command INSTREAM" /var/log/clamav/clamav.log
129
$ sudo grep -c "Chunks complete" /var/log/clamav/clamav.log
129
$ sudo grep -c "Scanthread: connection shut down" /var/log/clamav/clamav.log
129
...okay, now that I look a little more deeply into the logs it just takes a lot longer for some to be scanned. When I do a batch of 16 files, with Lambda concurrency restricted to 7 the first 7 files are scanned within a few seconds. The next file begins scanning soon after, gets to "Chunks complete" within a second, but takes 23 seconds before "Scanthread: connection shutdown". From here on it just gets worse - 1:24, 1:45... and then the 3rd batch of 7 files take over 3 minutes to scan.
If I give the system a few minutes to settle down, all the threads to die off, the same files that took over 3 minutes now take about 5-7 seconds.
If I run the same test on a faster machine the performance improves, but the issue is still there:
When threads get stuck at INSTREAM
I can see that the files are still there:
$ ls -al /tmp
drwx------ 2 clamav clamav 4096 Aug 29 16:52 clamav-493bdf893ce4d8d7763c00fee22d9d69.tmp
-rwx------ 1 clamav clamav 25683921 Aug 29 16:52 clamav-5cdefd83d5531a03c7cf22fda37d133f.tmp