Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Re: Reading Kafka directly from Pig?


Copy link to this message
-
Re: Reading Kafka directly from Pig?
Mohammad Tariq 2013-08-29, 11:59
Great job. +1

Warm Regards,
Tariq
cloudfront.blogspot.com
On Wed, Aug 7, 2013 at 8:27 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> Cool stuff, a Pig Kafka UDF.
>
> Russell Jurney http://datasyndrome.com
>
> Begin forwarded message:
>
> *From:* David Arthur <[EMAIL PROTECTED]>
> *Date:* August 7, 2013, 7:41:30 AM PDT
> *To:* [EMAIL PROTECTED]
> *Subject:* *Reading Kafka directly from Pig?*
> *Reply-To:* [EMAIL PROTECTED]
>
> I've thrown together a Pig LoadFunc to read data from Kafka, so you could
> load data like:
>
> QUERY_LOGS = load 'kafka://localhost:9092/logs.query#8' using
> com.mycompany.pig.KafkaAvroLoader('com.mycompany.Query');
>
> The path part of the uri is the Kafka topic, and the fragment is the number
> of partitions. In the implementation I have, it makes one input split per
> partition. Offsets are not really dealt with at this point - it's a rough
> prototype.
>
> Anyone have thoughts on whether or not this is a good idea? I know usually
> the pattern is: kafka -> hdfs -> mapreduce. If I'm only reading from this
> data from Kafka once, is there any reason why I can't skip writing to HDFS?
>
> Thanks!
> -David
>