Table of Contents
Prior to Kafka 0.8.1.1,
consumers commit their offsets to Zookeeper. Zookeeper does not scale extremely well (especially for writes) when there are a large number of offsets (i.e., consumer-count * partition-count).
Consumers can commit their offsets in Kafka by writing them to a durable (replicated) and highly available topic. Consumers can fetch offsets by reading from this topic (although we provide an in-memory offsets cache for faster access). i.e., offset commits are regular producer requests (which are inexpensive) and offset fetches are fast memory look ups.
import kafka def pbmsg_parser(value): msg = message_pb2.Message() if not msg.ParseFromString(value): logger.error('invalid message') return msg consumer = kafka.KafkaConsumer(topic, group_id='consumer-x', bootstrap_servers='126.96.36.199', auto_offset_reset='earliest', enable_auto_commit=False, value_deserializer=pbmsg_parser, reconnect_backoff_ms=100) while _run: try: batch = consumer.poll(timeout_ms=10000, max_records=10000) cnt = sum(map(len, batch.values())) if cnt == 0: continue process(batch) consumer.commit() except Exception as e: traceback.print_exc()
- Kafka Connect
- a tool for scalably and reliably streaming data between Apache Kafka and other data systems.
- Kafka Streams
- a lightweight library for creating stream processing applications.