Maven build download artefacts connection slow/res

2019-07-15 01:22发布

We have a docker image that runs a git clone command for a particular repository and then runs maven build . When running this image locally, it works fine. When running this image in an AWS VM, it works fine.

The problem we have is that when we run this image inside ACI (Azure Container Instances) or on an Azure VM - the download artefacts step inside the maven build has some connection issues - the jar downloads slow down very very much (sometimes) - and even timeout (sometimes).

We parameterize the repository that is built with this image - and the timeout issue only occurs on a few projects. As far as we can tell those projects do not have anything special.

For a particular configuration of the vm and the mvn commands we actually run - the connection issue occurs at the same set of artefacts.

If we change the mvn commands - the place where the connection issue occurs changes.

  1. Initially we had a single mvn clean package command executed after the git clone - that was generating an issue on a particular set of jars. We then added mvn dependency:resolve-plugins, mvn compile dependency:resolve and finally mvn clean package. We did this because we thought that some tests that were running initially might have caused the connection issues - and so we moved the artefact download step first. This did not solve the issue - just changed the place where the jar downloads freeze.

  2. Changed the mvn thread count configuration and also the VM core and memory sizes - but this did not help.

  3. We set a TCP Keepalive flag on the VM - as to avoid a possible Azure NAT/Load Balancer timeout that was killing our connections. This was a solution suggested by Azure support and we also found it here: Maven build gets connection reset when downloading artifacts We configured it based on the azure guide: https://github.com/wbuchwalter/azure-content/blob/master/includes/guidance-tcp-session-timeout-include.md

> sysctl net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_keepalive_probes
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 20

This is a sample of the mvn log:

14:10:48,505 [BasicRepositoryConnector-repo.maven.apache.org-27-0] [INFO] Downloading from central: https://repo.maven.apache.org/maven2/regexp/regexp/1.3/regexp-1.3.jar
14:10:48,506 [BasicRepositoryConnector-repo.maven.apache.org-27-2] [INFO] Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/scm/maven-scm-provider-cvs-commons/1.7/maven-scm-provider-cvs-commons-1.7.jar
14:10:48,505 [BasicRepositoryConnector-repo.maven.apache.org-27-1] [INFO] Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/scm/maven-scm-provider-git-commons/1.7/maven-scm-provider-git-commons-1.7.jar
14:10:48,521 [BasicRepositoryConnector-repo.maven.apache.org-27-3] [INFO] Downloading from central: https://repo.maven.apache.org/maven2/org/tmatesoft/sqljet/sqljet/1.0.4/sqljet-1.0.4.jar
14:10:48,523 [BasicRepositoryConnector-repo.maven.apache.org-27-4] [INFO] Downloading from central: https://repo.maven.apache.org/maven2/org/antlr/antlr-runtime/3.1.3/antlr-runtime-3.1.3.jar
14:10:48,540 [BasicRepositoryConnector-repo.maven.apache.org-27-0] [INFO] Downloaded from central: https://repo.maven.apache.org/maven2/regexp/regexp/1.3/regexp-1.3.jar (25 kB at 706 kB/s)
14:10:48,540 [BasicRepositoryConnector-repo.maven.apache.org-27-0] [INFO] Downloading from central: https://repo.maven.apache.org/maven2/org/antlr/stringtemplate/3.2/stringtemplate-3.2.jar
14:10:48,564 [BasicRepositoryConnector-repo.maven.apache.org-27-0] [INFO] Downloaded from central: https://repo.maven.apache.org/maven2/org/antlr/stringtemplate/3.2/stringtemplate-3.2.jar (172 kB at 4.0 MB/s)
14:26:32,150 [BasicRepositoryConnector-repo.maven.apache.org-27-2] [INFO] Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/scm/maven-scm-provider-cvs-commons/1.7/maven-scm-provider-cvs-commons-1.7.jar (80 kB at 84 B/s)
14:26:32,157 [BasicRepositoryConnector-repo.maven.apache.org-27-4] [INFO] Downloaded from central: https://repo.maven.apache.org/maven2/org/antlr/antlr-runtime/3.1.3/antlr-runtime-3.1.3.jar (151 kB at 159 B/s)
14:26:32,199 [BasicRepositoryConnector-repo.maven.apache.org-27-3] [INFO] Downloaded from central: https://repo.maven.apache.org/maven2/org/tmatesoft/sqljet/sqljet/1.0.4/sqljet-1.0.4.jar (744 kB at 788 B/s)

Notice the suspicious: (151 kB at 159 B/s), (80 kB at 84 B/s), (744 kB at 788 B/s)

We have examples of executions that run just fine - and examples of executions that timeout (1 hour) - and examples of executions that take close to 1h.

Solutions:

  • We have options to pre-cache some of the jars in the initial docker image - and thus not need maven to handle them. But the docker image that handles this build needs to do run for any git repo (Java + Maven) and we cannot know what dependecies those projects have.

  • Similar to the other point, have options to create an external volume that is shared between running containers and to cache the jars there.

  • We have options to restart the maven build once it fails - because part of the dependencies would have already been downloaded and it will not get stuck at the same place.

We reached out to Azure support and they recommended the TCP Keep-alive configuration - but that did not solve our problem.

We want to understand the root cause of the issue - is it a docker config ? is it a maven bug ? is it an azure specific issue ? The connection issue occurs roughly 9/10 executions - i have no ideea why it works - and no ideea why it doesn't work :) The solutions I mentioned before are just work-arounds - they do not fix it - just ignore it.

Found the problem

The issue is that Maven reuses the same HTTP Connections for the download of the pom/jar files. https://maven.apache.org/guides/mini/guide-http-settings.html#Maven_3.0.4 Thus - our scenario is along the following lines:

Project

-- module 1 - download some pom/jars - keep connection active

-- module 2 - runs some plugins / tests - lasts more that 5 minutes

-- module 3 - tries to download some pom/jars

Azure - the NAT configuration: https://github.com/wbuchwalter/azure-content/blob/master/includes/guidance-tcp-session-timeout-include.md kills any idle connections after 4 minutes.

So during the execution of module 2 - all of the connections initially opened and used by module 1 get closed - and module 3 does not know it.

Our solution - given the NAT 4 minute timeout cannot be configured - is to use tcp keep-alive or to force Maven to use a different connection pool implementation or to use an eviction manager to would "nicely" close out these idle connections before NAT can close them "forcefully".

0条回答
登录 后发表回答