Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: Reading Kafka directly from Pig?


Copy link to this message
-
Re: Reading Kafka directly from Pig?
Great job. +1

Warm Regards,
Tariq
cloudfront.blogspot.com
On Wed, Aug 7, 2013 at 8:27 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> Cool stuff, a Pig Kafka UDF.
>
> Russell Jurney http://datasyndrome.com
>
> Begin forwarded message:
>
> *From:* David Arthur <[EMAIL PROTECTED]>
> *Date:* August 7, 2013, 7:41:30 AM PDT
> *To:* [EMAIL PROTECTED]
> *Subject:* *Reading Kafka directly from Pig?*
> *Reply-To:* [EMAIL PROTECTED]
>
> I've thrown together a Pig LoadFunc to read data from Kafka, so you could
> load data like:
>
> QUERY_LOGS = load 'kafka://localhost:9092/logs.query#8' using
> com.mycompany.pig.KafkaAvroLoader('com.mycompany.Query');
>
> The path part of the uri is the Kafka topic, and the fragment is the number
> of partitions. In the implementation I have, it makes one input split per
> partition. Offsets are not really dealt with at this point - it's a rough
> prototype.
>
> Anyone have thoughts on whether or not this is a good idea? I know usually
> the pattern is: kafka -> hdfs -> mapreduce. If I'm only reading from this
> data from Kafka once, is there any reason why I can't skip writing to HDFS?
>
> Thanks!
> -David
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB