Confluent librdkafka kafka.Consumer.ReadMessage timeout

I use the GoLang librdkafka client to consume messages from Kafka, code looks like:

msg, err := c.kafkaConsumer.ReadMessage(2 * time.Second)

2 second timeout is more a guess than a conscious decision.

I'd like to drop this to be much lower, e.g. ReadMessage(50 * time.Millisecond) if possible (for... reasons!).

Reading the comments on func (h *handle) eventPoll(...) the ReadMessage essentially wraps it says // eventPoll polls an event from the handler's C rd_kafka_queue_t, so I don't believe this will be spamming the network if I do this, however are there other reasons to not make this read loop faster?

Answer

Ah, good question! Let me walk you through the considerations for adjusting that timeout:

First, the good news:
- That timeout is just how long it waits before checking again
- No, it won't spam your network - you're right about that
- librdkafka is quite efficient underneath
But some things to watch for:
- You might see slightly higher CPU usage with more frequent polling
- At 50ms, you'll get timeout errors much more often (though these are normal)
- Might affect consumer group rebalancing if you go extremely low

What I typically recommend:

Copy

Download

// This 100ms middle ground often works well
msg, err := c.kafkaConsumer.ReadMessage(100 * time.Millisecond)

When 50ms makes sense:
- If you're doing real-time processing
- When you have a constant high message flow
- If you've tested and can handle the timeout errors properly

If you want to handle those timeout errors, check them like this.

if err != nil {
    if kerr, ok := err.(kafka.Error); ok && kerr.Code() == kafka.ErrTimedOut {
        // This is just a normal timeout
        continue
    }
    // Handle real errors here
}

If you are still considering other approaches, you might also look at:

Using Poll() instead for more control
Implementing batching logic
Monitoring your consumer lag metrics

The key is to test with your actual workload. Try 50ms and watch your system metrics - if it works for your case, go for it!

These are just suggestions based on my experience - your specific use case might need different tuning. The great thing about Kafka is its flexibility, so I'd recommend experimenting with different timeout values while monitoring your system metrics. You might find that values between 50-200ms work well for low-latency scenarios.

If you'd like, I can share some monitoring approaches to help evaluate what works best for your particular setup. Just let me know what metrics would be most helpful for you to track.

Answer

Enjoyed this question?