Explicit offset committing#

Sumit Rawal answered on June 1, 2023 Popularity 1/10 Helpfulness 1/10

Contents


More Related Answers


Explicit offset committing#

0

You should only commit offsets after the messages have been processed. If you don’t maintain any state within the poll loop and all the processing is contained within the poll loop, then you may use automatic committing of offset. Note that reducing the commit frequency takes a toll on performance. There is always some overhead involved when committing offsets. Another note is to make sure to commit the offset of the message that was processed and not the offset that was read from the last poll() invocation. Committing offsets for messages read but not processed can result in the consumer missing messages.

Be cognizant of rebalances; you should commit offsets before partitions are revoked and clean up any state you maintain before new partitions are assigned.

Consider the scenario of a consumer that polls for records and is unable to process some of the received records. It could be that the consumer writes the records to a database and the database may be unavailable. Unlike other messaging systems, we don’t acknowledge each individual message. Rather, when we commit offset for, say, message #91, all the records up to the offset #91 are considered read and processed even though we might have not been able to write message #89 into our database. We can solve this situation in one of the following two ways:

Store the messages for which a retriable error was received in a buffer for later processing. Recall that we can’t stop poll()-ing, otherwise, the broker will think that the consumer is dead and trigger a rebalance. We can also use the pause() API to stop fetching messages from the subscribed topics and take our time to process the failed messages. When pause() is invoked, the next call to poll() will not return any records until the resume() call is invoked.

Another strategy is to write the message for which a retriable error was encountered to another topic and continue. A separate consumer group can handle messages written to the retry topic or the same consumer can also subscribe to the retry topic in addition to the main topic.

We can use Kafka for use cases which require us to maintain a result or state computed so far, such as calculating moving averages. We can store the result of a computation in a results topic, but it may happen that we store the result and the consumer crashes before the offset is committed or vice versa. This task isn’t trivial to accomplish since Kafka doesn’t offer transactions yet. It is recommended to look at a library like Kafka Streams, which provides high level DSL-like APIs for aggregation, joins, windows, and other complex analytics.

Given that in some versions of Kafka we must keep polling to send out heartbeats to the Kafka broker, we can’t take too long in the poll loop to process records. If we expect processing to take significant time, we can use a threadpool to hand off records for processing. While the thread completes the computation, we can pause the consumer and continue to poll so that no data is fetched but the heartbeats are still sent out.

Popularity 1/10 Helpfulness 1/10 Language whatever
Source: Grepper
Link to this answer
Share Copy Link
Contributed on Jun 01 2023
Sumit Rawal
0 Answers  Avg Quality 2/10


X

Continue with Google

By continuing, I agree that I have read and agree to Greppers's Terms of Service and Privacy Policy.
X
Grepper Account Login Required

Oops, You will need to install Grepper and log-in to perform this action.