Java RabbitMQ client hangs on resend via thread of

2019-08-06 17:17发布

问题:

I am currently experimenting with failure scenarios that might happen when communicating via the message broker RabbitMQ. The goal is to evaluate how such communication can be made more resilient.

In particular, I want to trigger a nack (not-acknowledge) confirm when sending messages in producer-commit mode. To do so, I send a message to a non-existent exchange via Spring AMQP's RabbitTemplate.send. In the callback provided via RabbitTemplate.setConfirmCallback, I then handle ack=false confirms by resending the message to an existing exchange (simulating that I took care of the nack cause).

A sample class and the related test are provided below, the complete sample project can be found in my github repository. I use RabbitMQ 3.6 and Spring Boot/AMQP 2.0.2.

When running the test, the callback is called with ack=false as expected. However, re-sending the message hangs while re-creating a channel (with a timeout exception after 10 minutes). A dump of the call stack and logs are provided below.

A solution to the problem seems to be to send the message in a different thread as proposed here. If you uncomment the line service.runInSeparateThread = true; in the test, things work!

However, I neither truely understand why things (don't) work nor did I read about this practice anywhere except for the above mentioned post. Is this expected behavior or a bug? Can someone explain the details?

Thanks a lot for your advice!

A call stack snapshot:

 "AMQP Connection 127.0.0.1:5672@3968" prio=5 tid=0xe nid=NA waiting
 java.lang.Thread.State: WAITING
  at java.lang.Object.wait(Object.java:-1)
  at com.rabbitmq.utility.BlockingCell.get(BlockingCell.java:73)
  at com.rabbitmq.utility.BlockingCell.uninterruptibleGet(BlockingCell.java:120)
  at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36)
  at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:494)
  at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:288)
  at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:138)
  at com.rabbitmq.client.impl.ChannelN.open(ChannelN.java:133)
  at com.rabbitmq.client.impl.ChannelManager.createChannel(ChannelManager.java:176)
  at com.rabbitmq.client.impl.AMQConnection.createChannel(AMQConnection.java:542)
  at org.springframework.amqp.rabbit.connection.SimpleConnection.createChannel(SimpleConnection.java:57)
  at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createBareChannel(CachingConnectionFactory.java:1156)
  at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.access$200(CachingConnectionFactory.java:1144)
  at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.doCreateBareChannel(CachingConnectionFactory.java:585)
  at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createBareChannel(CachingConnectionFactory.java:568)
  at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getCachedChannelProxy(CachingConnectionFactory.java:538)
  at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getChannel(CachingConnectionFactory.java:520)
  at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.access$1500(CachingConnectionFactory.java:94)
  at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createChannel(CachingConnectionFactory.java:1161)
  at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1803)
  at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1771)
  at org.springframework.amqp.rabbit.core.RabbitTemplate.send(RabbitTemplate.java:859)
  ...

The logs:

...
10:21:24.613 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitAdmin - declaring Exchange 'ExistentExchange'
10:21:24.630 [main] INFO com.example.rabbitmq.ProducerService - sending `initial Message`
10:21:24.648 [main] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - Added listener org.springframework.amqp.rabbit.core.RabbitTemplate$MockitoMock$952329793@562c877a
10:21:24.648 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Added publisher confirm channel: Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1), conn: Proxy@3013909b Shared Rabbit Connection: SimpleConnection@12db3386 [delegate=amqp://guest@127.0.0.1:5672/, localPort= 1341] to map, size now 1
10:21:24.649 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Executing callback RabbitTemplate$$Lambda$175/1694519286 on RabbitMQ Channel: Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1), conn: Proxy@3013909b Shared Rabbit Connection: SimpleConnection@12db3386 [delegate=amqp://guest@127.0.0.1:5672/, localPort= 1341]
10:21:24.649 [main] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Publishing message (Body:'[B@67001148(byte[15])' MessageProperties [headers={}, contentType=application/octet-stream, contentLength=0, deliveryMode=PERSISTENT, priority=0, deliveryTag=0])on exchange [nonExistentExchange], routingKey = [nonExistentQueue]
10:21:24.659 [main] INFO com.example.rabbitmq.ProducerService - done with sending message
10:21:24.675 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) PC:Nack:(close):1
10:21:24.677 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - Sending confirm PendingConfirm [correlationData=null cause=channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'nonExistentExchange' in vhost '/', class-id=60, method-id=40)]
10:21:24.677 [AMQP Connection 127.0.0.1:5672] INFO com.example.rabbitmq.ProducerService - In confirm callback, ack=false, cause=channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'nonExistentExchange' in vhost '/', class-id=60, method-id=40), correlationData=null
10:21:24.677 [AMQP Connection 127.0.0.1:5672] INFO com.example.rabbitmq.ProducerService - sending `resend Message`
10:21:24.678 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) PC:Nack:(close):1
10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - AMQChannel(amqp://guest@127.0.0.1:5672/,1) No listener for seq:1
10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Removed publisher confirm channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) from map, size now 0
10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.core.RabbitTemplate - Removed publisher confirm channel: PublisherCallbackChannelImpl: AMQChannel(amqp://guest@127.0.0.1:5672/,1) from map, size now 0
10:21:24.679 [AMQP Connection 127.0.0.1:5672] DEBUG org.springframework.amqp.rabbit.support.PublisherCallbackChannelImpl - PendingConfirms cleared 

ProducerService:

@Service
public class ProducerService {

    static final String EXISTENT_EXCHANGE = "ExistentExchange";
    private static final String NON_EXISTENT_EXCHANGE = "nonExistentExchange";
    private static final String QUEUE_NAME = "nonExistentQueue";
    private final Logger logger = LoggerFactory.getLogger(getClass());
    private final RabbitTemplate rabbitTemplate;
    private final Executor executor = Executors.newCachedThreadPool();
    boolean runInSeparateThread = false;

    public ProducerService(RabbitTemplate rabbitTemplate) {
        this.rabbitTemplate = rabbitTemplate;
        rabbitTemplate.setConfirmCallback(this::confirmCallback);
    }

    private void confirmCallback(CorrelationData correlationData, boolean ack, String cause) {
        logger.info("In confirm callback, ack={}, cause={}, correlationData={}", ack, cause, correlationData);
        if (!ack) {
            if (runInSeparateThread) {
                executor.execute(() -> sendMessage("resend Message", EXISTENT_EXCHANGE));
            } else {
                sendMessage("resend Message", EXISTENT_EXCHANGE);
            }
        } else {
            logger.info("sending was acknowledged");
        }
    }

    public void produceMessage() {
        sendMessage("initial Message", NON_EXISTENT_EXCHANGE);
    }

    private void sendMessage(String messageBody, String exchangeName) {
        logger.info("sending `{}`", messageBody);
        rabbitTemplate.send(exchangeName, QUEUE_NAME, new Message(messageBody.getBytes(), new MessageProperties()));
        logger.info("done with sending message");
    }

}

ProducerServiceTest:

@RunWith(SpringRunner.class)
@ContextConfiguration(classes = {RabbitAutoConfiguration.class, ProducerService.class})
@DirtiesContext
public class ProducerServiceTest {

    @Autowired
    private ProducerService service;
    @SpyBean
    private RabbitTemplate rabbitTemplate;
    @Autowired
    private AmqpAdmin amqpAdmin;
    @Autowired
    private CachingConnectionFactory cachingConnectionFactory;

    @Before
    public void setup() {
        cachingConnectionFactory.setPublisherConfirms(true);
        amqpAdmin.declareExchange(new DirectExchange(ProducerService.EXISTENT_EXCHANGE));
    }

    @After
    public void cleanup() {
        amqpAdmin.deleteExchange(ProducerService.EXISTENT_EXCHANGE);
    }

    @Test
    public void sendMessageToNonexistentExchange() throws InterruptedException {
        final CountDownLatch sentMessagesLatch = new CountDownLatch(2);
        final List<Message> sentMessages = new ArrayList<>();
        doAnswer(invocation -> {
            invocation.callRealMethod();
            sentMessages.add(invocation.getArgument(2));
            sentMessagesLatch.countDown();
            return null;
        }).when(rabbitTemplate).send(anyString(), anyString(), any(Message.class));

//        service.runInSeparateThread = true;
        service.produceMessage();
        sentMessagesLatch.await();

        List<String> messageBodies = sentMessages.stream().map(message -> new String(message.getBody())).collect(toList());
        assertThat(messageBodies, equalTo(Arrays.asList("initial Message", "resend Message")));
    }

}

回答1:

It could be considered a bug, I suppose, but it's an artifact of the way we cache channels to improve performance. The problem is that attempting to publish on a channel on the same thread that's delivering an ack for the same channel causes a deadlock in the client library.

We have an open issue to look into a solution (for a different reason); we just haven't gotten around to it. AFAIK, you are only the second user to hit this in more than 6 years since we added support for confirms and returns.

EDIT

Actually, this is a different situation; it's not reusing the channel since the channel is closed. It is trying to create a new channel and that is what is deadlocked. I don't see how we (Spring AMQP) can do anything; it's a limitation of the java client; you cannot perform operations on the ack thread.