-Fwd: Reading Kafka directly from Pig?
Russell Jurney 2013-08-07, 14:57
Cool stuff, a Pig Kafka UDF.
Russell Jurney http://datasyndrome.com
Begin forwarded message:
*From:* David Arthur <[EMAIL PROTECTED]>
*Date:* August 7, 2013, 7:41:30 AM PDT
*To:* [EMAIL PROTECTED]
*Subject:* *Reading Kafka directly from Pig?*
*Reply-To:* [EMAIL PROTECTED]
I've thrown together a Pig LoadFunc to read data from Kafka, so you could
load data like:
QUERY_LOGS = load 'kafka://localhost:9092/logs.query#8' using
The path part of the uri is the Kafka topic, and the fragment is the number
of partitions. In the implementation I have, it makes one input split per
partition. Offsets are not really dealt with at this point - it's a rough
Anyone have thoughts on whether or not this is a good idea? I know usually
the pattern is: kafka -> hdfs -> mapreduce. If I'm only reading from this
data from Kafka once, is there any reason why I can't skip writing to HDFS?