Kafka, mail # user - Reading Kafka directly from Pig? - 2013-08-07, 14:42
 Search Hadoop and all its subprojects:

Switch to Plain View
Copy link to this message
Reading Kafka directly from Pig?
I've thrown together a Pig LoadFunc to read data from Kafka, so you
could load data like:

QUERY_LOGS = load 'kafka://localhost:9092/logs.query#8' using

The path part of the uri is the Kafka topic, and the fragment is the
number of partitions. In the implementation I have, it makes one input
split per partition. Offsets are not really dealt with at this point -
it's a rough prototype.

Anyone have thoughts on whether or not this is a good idea? I know
usually the pattern is: kafka -> hdfs -> mapreduce. If I'm only reading
from this data from Kafka once, is there any reason why I can't skip
writing to HDFS?


Jun Rao 2013-08-07, 14:49
Russell Jurney 2013-08-07, 14:59
David Arthur 2013-08-07, 15:20
David Arthur 2013-08-07, 15:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB