Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> ETL with Kafka


+
Guy Doulberg 2013-01-06, 07:49
+
David Arthur 2013-01-06, 22:29
+
Russell Jurney 2013-01-07, 07:00
+
Guy Doulberg 2013-01-07, 07:12
+
Ken Krugler 2013-01-07, 17:57
+
Russell Jurney 2013-01-07, 20:48
+
Ken Krugler 2013-01-07, 21:51
+
Russell Jurney 2013-01-07, 22:06
Copy link to this message
-
Re: ETL with Kafka

On Jan 7, 2013, at 2:05pm, Russell Jurney wrote:

> I previously posted a link to contrib in this thread.

Thanks, I missed that - all I saw was the long URL to the Talend integration doc on Hortonworks.

> No, its not a
> cascading tap. Its a complete job. One to read kafka events to hdfs, one to
> generate kafka events from hdfs. ETL can happen in between.

Some Cascading integration notes, just for posterity:

Having a Kafka Tap/Scheme would make integration easy. I see there are KafkaInputFormat and KafkaOutputFormat classes in the contrib, which is great - though these would have to back-port these to the older Hadoop APIs in order to work with Cascading. Also Cascading sends all data around as the key (value is always NullWritable) whereas the Kafka input/output formats do the opposite.

-- Ken

> On Jan 7, 2013 1:51 PM, "Ken Krugler" <[EMAIL PROTECTED]> wrote:
>
>> Hi Russell,
>>
>> On Jan 7, 2013, at 12:48pm, Russell Jurney wrote:
>>
>>> Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans
>>> Hadoop records, which may be ETL'd first, and emits new Kafka events.
>>
>> Can you point me at the code?
>>
>> And just to confirm, you're talking about a Cascading Tap, right?
>>
>> -- Ken
>>
>>> On Mon, Jan 7, 2013 at 9:57 AM, Ken Krugler <[EMAIL PROTECTED]
>>> wrote:
>>>
>>>> Hi Guy,
>>>>
>>>> On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote:
>>>>
>>>>> Hi,
>>>>> Thanks David,
>>>>>
>>>>> I am looking for a product (open source or not), something like Talend
>>>> or Pentaho that in which I can design the ETL (from and to kafka), and
>> run
>>>> the the ETL in Storm/ IronCount or even maybe I can run it in Hadoop
>>>> Map/Reduce.
>>>>
>>>> Interesting - we build ETLs on top of Hadoop using Cascading (open
>> source
>>>> workflow API), which has a lot of what it calls "Taps" for connecting to
>>>> data sources and sinks.
>>>>
>>>> But I haven't heard of a Kafka Tap. Should be possible to implement,
>>>> though.
>>>>
>>>> One issue is that Hadoop is batch oriented, so there's a bit of an
>>>> impedance mismatch when you've got a streaming data source, but from
>>>> experience it's possible to get that to work.
>>>>
>>>> -- Ken
>>>>
>>>>> The product should be complete and supports many connections to many
>>>> data sources and targets, In that sense if you know of a connection to
>>>> Talend or Pentaho it will be great.
>>>>>
>>>>> Thanks again.
>>>>> ,
>>>>>
>>>>>
>>>>> On 01/07/2013 12:28 AM, David Arthur wrote:
>>>>>> Storm has support for Kafka, if that's the sort of thing you're
>> looking
>>>>>> for. Maybe you could describe your use case a bit more?
>>>>>>
>>>>>> On Sunday, January 6, 2013, Guy Doulberg wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> I am looking for an ETL tool that can connect to kafka, as a consumer
>>>> and
>>>>>>> as a producer,
>>>>>>>
>>>>>>> Have you heard of such a tool?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Guy
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>> --------------------------
>>>> Ken Krugler
>>>> +1 530-210-6378
>>>> http://www.scaleunlimited.com
>>>> custom big data solutions & training
>>>> Hadoop, Cascading, Cassandra & Solr
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
>> datasyndrome.com
>>
>> --------------------------------------------
>> http://about.me/kkrugler
>> +1 530-210-6378
>>
>>
>>
>>
>>
>>
>> --------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>>
>>
>>
>>
>>
>>

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr