Introduction

Kafka Basics.

Table of Contents

History

Prior to Kafka 0.8.1.1,

consumers commit their offsets to Zookeeper. Zookeeper does not scale extremely well (especially for writes) when there are a large number of offsets (i.e., consumer-count * partition-count).

Now,

Consumers can commit their offsets in Kafka by writing them to a durable (replicated) and highly available topic. Consumers can fetch offsets by reading from this topic (although we provide an in-memory offsets cache for faster access). i.e., offset commits are regular producer requests (which are inexpensive) and offset fetches are fast memory look ups.

Python

import kafka

def pbmsg_parser(value):
  msg = message_pb2.Message()
  if not msg.ParseFromString(value):
    logger.error('invalid message')
  return msg

consumer = kafka.KafkaConsumer(topic,
		group_id='consumer-x',
		bootstrap_servers='1.2.3.4',
		auto_offset_reset='earliest',
		enable_auto_commit=False,
		value_deserializer=pbmsg_parser,
		reconnect_backoff_ms=100)

while _run:
  try:
    batch = consumer.poll(timeout_ms=10000, max_records=10000)
    cnt = sum(map(len, batch.values()))
    if cnt == 0: continue
    process(batch)
    consumer.commit()
  except Exception as e:
    traceback.print_exc()

Arch

  • Kafka Connect
    • a tool for scalably and reliably streaming data between Apache Kafka and other data systems.
  • Kafka Streams
    • a lightweight library for creating stream processing applications.

References