|
Guy Doulberg
2013-01-06, 07:49
David Arthur
2013-01-06, 22:29
Russell Jurney
2013-01-07, 07:00
Guy Doulberg
2013-01-07, 07:12
Ken Krugler
2013-01-07, 17:57
Russell Jurney
2013-01-07, 20:48
Ken Krugler
2013-01-07, 21:51
Russell Jurney
2013-01-07, 22:06
Ken Krugler
2013-01-07, 22:21
|
-
ETL with KafkaGuy Doulberg 2013-01-06, 07:49
Hi
I am looking for an ETL tool that can connect to kafka, as a consumer and as a producer, Have you heard of such a tool? Thanks Guy +
Guy Doulberg 2013-01-06, 07:49
-
Re: ETL with KafkaDavid Arthur 2013-01-06, 22:29
Storm has support for Kafka, if that's the sort of thing you're looking
for. Maybe you could describe your use case a bit more? On Sunday, January 6, 2013, Guy Doulberg wrote: > Hi > > I am looking for an ETL tool that can connect to kafka, as a consumer and > as a producer, > > Have you heard of such a tool? > > Thanks > Guy > > -- David Arthur +
David Arthur 2013-01-06, 22:29
-
Re: ETL with KafkaRussell Jurney 2013-01-07, 07:00
You can use Kafka to store data on Hadoop via the Hadoop consumer in
contrib, and then use Talend or Pig to ETL it, before finally emitting the ETL's records via the Hadoop producer in contrib. https://github.com/kafka-dev/kafka/tree/master/contrib http://docs.hortonworks.com/CURRENT/index.htm#Data_Integration_Services_With_HDP/Using_Data_Integration_Services_Powered_By_Talend/Using_Talend.htm Russell Jurney http://datasyndrome.com On Jan 6, 2013, at 2:29 PM, David Arthur <[EMAIL PROTECTED]> wrote: Storm has support for Kafka, if that's the sort of thing you're looking for. Maybe you could describe your use case a bit more? On Sunday, January 6, 2013, Guy Doulberg wrote: Hi I am looking for an ETL tool that can connect to kafka, as a consumer and as a producer, Have you heard of such a tool? Thanks Guy -- David Arthur +
Russell Jurney 2013-01-07, 07:00
-
Re: ETL with KafkaGuy Doulberg 2013-01-07, 07:12
Hi,
Thanks David, I am looking for a product (open source or not), something like Talend or Pentaho that in which I can design the ETL (from and to kafka), and run the the ETL in Storm/ IronCount or even maybe I can run it in Hadoop Map/Reduce. The product should be complete and supports many connections to many data sources and targets, In that sense if you know of a connection to Talend or Pentaho it will be great. Thanks again. , On 01/07/2013 12:28 AM, David Arthur wrote: > Storm has support for Kafka, if that's the sort of thing you're looking > for. Maybe you could describe your use case a bit more? > > On Sunday, January 6, 2013, Guy Doulberg wrote: > >> Hi >> >> I am looking for an ETL tool that can connect to kafka, as a consumer and >> as a producer, >> >> Have you heard of such a tool? >> >> Thanks >> Guy >> >> +
Guy Doulberg 2013-01-07, 07:12
-
Re: ETL with KafkaKen Krugler 2013-01-07, 17:57
Hi Guy,
On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote: > Hi, > Thanks David, > > I am looking for a product (open source or not), something like Talend or Pentaho that in which I can design the ETL (from and to kafka), and run the the ETL in Storm/ IronCount or even maybe I can run it in Hadoop Map/Reduce. Interesting - we build ETLs on top of Hadoop using Cascading (open source workflow API), which has a lot of what it calls "Taps" for connecting to data sources and sinks. But I haven't heard of a Kafka Tap. Should be possible to implement, though. One issue is that Hadoop is batch oriented, so there's a bit of an impedance mismatch when you've got a streaming data source, but from experience it's possible to get that to work. -- Ken > The product should be complete and supports many connections to many data sources and targets, In that sense if you know of a connection to Talend or Pentaho it will be great. > > Thanks again. > , > > > On 01/07/2013 12:28 AM, David Arthur wrote: >> Storm has support for Kafka, if that's the sort of thing you're looking >> for. Maybe you could describe your use case a bit more? >> >> On Sunday, January 6, 2013, Guy Doulberg wrote: >> >>> Hi >>> >>> I am looking for an ETL tool that can connect to kafka, as a consumer and >>> as a producer, >>> >>> Have you heard of such a tool? >>> >>> Thanks >>> Guy >>> >>> > -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr +
Ken Krugler 2013-01-07, 17:57
-
Re: ETL with KafkaRussell Jurney 2013-01-07, 20:48
Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans
Hadoop records, which may be ETL'd first, and emits new Kafka events. On Mon, Jan 7, 2013 at 9:57 AM, Ken Krugler <[EMAIL PROTECTED]>wrote: > Hi Guy, > > On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote: > > > Hi, > > Thanks David, > > > > I am looking for a product (open source or not), something like Talend > or Pentaho that in which I can design the ETL (from and to kafka), and run > the the ETL in Storm/ IronCount or even maybe I can run it in Hadoop > Map/Reduce. > > Interesting - we build ETLs on top of Hadoop using Cascading (open source > workflow API), which has a lot of what it calls "Taps" for connecting to > data sources and sinks. > > But I haven't heard of a Kafka Tap. Should be possible to implement, > though. > > One issue is that Hadoop is batch oriented, so there's a bit of an > impedance mismatch when you've got a streaming data source, but from > experience it's possible to get that to work. > > -- Ken > > > The product should be complete and supports many connections to many > data sources and targets, In that sense if you know of a connection to > Talend or Pentaho it will be great. > > > > Thanks again. > > , > > > > > > On 01/07/2013 12:28 AM, David Arthur wrote: > >> Storm has support for Kafka, if that's the sort of thing you're looking > >> for. Maybe you could describe your use case a bit more? > >> > >> On Sunday, January 6, 2013, Guy Doulberg wrote: > >> > >>> Hi > >>> > >>> I am looking for an ETL tool that can connect to kafka, as a consumer > and > >>> as a producer, > >>> > >>> Have you heard of such a tool? > >>> > >>> Thanks > >>> Guy > >>> > >>> > > > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Cassandra & Solr > > > > > > -- Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com +
Russell Jurney 2013-01-07, 20:48
-
Re: ETL with KafkaKen Krugler 2013-01-07, 21:51
Hi Russell,
On Jan 7, 2013, at 12:48pm, Russell Jurney wrote: > Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans > Hadoop records, which may be ETL'd first, and emits new Kafka events. Can you point me at the code? And just to confirm, you're talking about a Cascading Tap, right? -- Ken > On Mon, Jan 7, 2013 at 9:57 AM, Ken Krugler <[EMAIL PROTECTED]>wrote: > >> Hi Guy, >> >> On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote: >> >>> Hi, >>> Thanks David, >>> >>> I am looking for a product (open source or not), something like Talend >> or Pentaho that in which I can design the ETL (from and to kafka), and run >> the the ETL in Storm/ IronCount or even maybe I can run it in Hadoop >> Map/Reduce. >> >> Interesting - we build ETLs on top of Hadoop using Cascading (open source >> workflow API), which has a lot of what it calls "Taps" for connecting to >> data sources and sinks. >> >> But I haven't heard of a Kafka Tap. Should be possible to implement, >> though. >> >> One issue is that Hadoop is batch oriented, so there's a bit of an >> impedance mismatch when you've got a streaming data source, but from >> experience it's possible to get that to work. >> >> -- Ken >> >>> The product should be complete and supports many connections to many >> data sources and targets, In that sense if you know of a connection to >> Talend or Pentaho it will be great. >>> >>> Thanks again. >>> , >>> >>> >>> On 01/07/2013 12:28 AM, David Arthur wrote: >>>> Storm has support for Kafka, if that's the sort of thing you're looking >>>> for. Maybe you could describe your use case a bit more? >>>> >>>> On Sunday, January 6, 2013, Guy Doulberg wrote: >>>> >>>>> Hi >>>>> >>>>> I am looking for an ETL tool that can connect to kafka, as a consumer >> and >>>>> as a producer, >>>>> >>>>> Have you heard of such a tool? >>>>> >>>>> Thanks >>>>> Guy >>>>> >>>>> >>> >> >> -------------------------- >> Ken Krugler >> +1 530-210-6378 >> http://www.scaleunlimited.com >> custom big data solutions & training >> Hadoop, Cascading, Cassandra & Solr >> >> >> >> >> >> > > > -- > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com -------------------------------------------- http://about.me/kkrugler +1 530-210-6378 -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr +
Ken Krugler 2013-01-07, 21:51
-
Re: ETL with KafkaRussell Jurney 2013-01-07, 22:06
I previously posted a link to contrib in this thread. No, its not a
cascading tap. Its a complete job. One to read kafka events to hdfs, one to generate kafka events from hdfs. ETL can happen in between. On Jan 7, 2013 1:51 PM, "Ken Krugler" <[EMAIL PROTECTED]> wrote: > Hi Russell, > > On Jan 7, 2013, at 12:48pm, Russell Jurney wrote: > > > Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans > > Hadoop records, which may be ETL'd first, and emits new Kafka events. > > Can you point me at the code? > > And just to confirm, you're talking about a Cascading Tap, right? > > -- Ken > > > On Mon, Jan 7, 2013 at 9:57 AM, Ken Krugler <[EMAIL PROTECTED] > >wrote: > > > >> Hi Guy, > >> > >> On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote: > >> > >>> Hi, > >>> Thanks David, > >>> > >>> I am looking for a product (open source or not), something like Talend > >> or Pentaho that in which I can design the ETL (from and to kafka), and > run > >> the the ETL in Storm/ IronCount or even maybe I can run it in Hadoop > >> Map/Reduce. > >> > >> Interesting - we build ETLs on top of Hadoop using Cascading (open > source > >> workflow API), which has a lot of what it calls "Taps" for connecting to > >> data sources and sinks. > >> > >> But I haven't heard of a Kafka Tap. Should be possible to implement, > >> though. > >> > >> One issue is that Hadoop is batch oriented, so there's a bit of an > >> impedance mismatch when you've got a streaming data source, but from > >> experience it's possible to get that to work. > >> > >> -- Ken > >> > >>> The product should be complete and supports many connections to many > >> data sources and targets, In that sense if you know of a connection to > >> Talend or Pentaho it will be great. > >>> > >>> Thanks again. > >>> , > >>> > >>> > >>> On 01/07/2013 12:28 AM, David Arthur wrote: > >>>> Storm has support for Kafka, if that's the sort of thing you're > looking > >>>> for. Maybe you could describe your use case a bit more? > >>>> > >>>> On Sunday, January 6, 2013, Guy Doulberg wrote: > >>>> > >>>>> Hi > >>>>> > >>>>> I am looking for an ETL tool that can connect to kafka, as a consumer > >> and > >>>>> as a producer, > >>>>> > >>>>> Have you heard of such a tool? > >>>>> > >>>>> Thanks > >>>>> Guy > >>>>> > >>>>> > >>> > >> > >> -------------------------- > >> Ken Krugler > >> +1 530-210-6378 > >> http://www.scaleunlimited.com > >> custom big data solutions & training > >> Hadoop, Cascading, Cassandra & Solr > >> > >> > >> > >> > >> > >> > > > > > > -- > > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] > datasyndrome.com > > -------------------------------------------- > http://about.me/kkrugler > +1 530-210-6378 > > > > > > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Cassandra & Solr > > > > > > +
Russell Jurney 2013-01-07, 22:06
-
Re: ETL with KafkaKen Krugler 2013-01-07, 22:21
On Jan 7, 2013, at 2:05pm, Russell Jurney wrote: > I previously posted a link to contrib in this thread. Thanks, I missed that - all I saw was the long URL to the Talend integration doc on Hortonworks. > No, its not a > cascading tap. Its a complete job. One to read kafka events to hdfs, one to > generate kafka events from hdfs. ETL can happen in between. Some Cascading integration notes, just for posterity: Having a Kafka Tap/Scheme would make integration easy. I see there are KafkaInputFormat and KafkaOutputFormat classes in the contrib, which is great - though these would have to back-port these to the older Hadoop APIs in order to work with Cascading. Also Cascading sends all data around as the key (value is always NullWritable) whereas the Kafka input/output formats do the opposite. -- Ken > On Jan 7, 2013 1:51 PM, "Ken Krugler" <[EMAIL PROTECTED]> wrote: > >> Hi Russell, >> >> On Jan 7, 2013, at 12:48pm, Russell Jurney wrote: >> >>> Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans >>> Hadoop records, which may be ETL'd first, and emits new Kafka events. >> >> Can you point me at the code? >> >> And just to confirm, you're talking about a Cascading Tap, right? >> >> -- Ken >> >>> On Mon, Jan 7, 2013 at 9:57 AM, Ken Krugler <[EMAIL PROTECTED] >>> wrote: >>> >>>> Hi Guy, >>>> >>>> On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote: >>>> >>>>> Hi, >>>>> Thanks David, >>>>> >>>>> I am looking for a product (open source or not), something like Talend >>>> or Pentaho that in which I can design the ETL (from and to kafka), and >> run >>>> the the ETL in Storm/ IronCount or even maybe I can run it in Hadoop >>>> Map/Reduce. >>>> >>>> Interesting - we build ETLs on top of Hadoop using Cascading (open >> source >>>> workflow API), which has a lot of what it calls "Taps" for connecting to >>>> data sources and sinks. >>>> >>>> But I haven't heard of a Kafka Tap. Should be possible to implement, >>>> though. >>>> >>>> One issue is that Hadoop is batch oriented, so there's a bit of an >>>> impedance mismatch when you've got a streaming data source, but from >>>> experience it's possible to get that to work. >>>> >>>> -- Ken >>>> >>>>> The product should be complete and supports many connections to many >>>> data sources and targets, In that sense if you know of a connection to >>>> Talend or Pentaho it will be great. >>>>> >>>>> Thanks again. >>>>> , >>>>> >>>>> >>>>> On 01/07/2013 12:28 AM, David Arthur wrote: >>>>>> Storm has support for Kafka, if that's the sort of thing you're >> looking >>>>>> for. Maybe you could describe your use case a bit more? >>>>>> >>>>>> On Sunday, January 6, 2013, Guy Doulberg wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> I am looking for an ETL tool that can connect to kafka, as a consumer >>>> and >>>>>>> as a producer, >>>>>>> >>>>>>> Have you heard of such a tool? >>>>>>> >>>>>>> Thanks >>>>>>> Guy >>>>>>> >>>>>>> >>>>> >>>> >>>> -------------------------- >>>> Ken Krugler >>>> +1 530-210-6378 >>>> http://www.scaleunlimited.com >>>> custom big data solutions & training >>>> Hadoop, Cascading, Cassandra & Solr >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] >> datasyndrome.com >> >> -------------------------------------------- >> http://about.me/kkrugler >> +1 530-210-6378 >> >> >> >> >> >> >> -------------------------- >> Ken Krugler >> +1 530-210-6378 >> http://www.scaleunlimited.com >> custom big data solutions & training >> Hadoop, Cascading, Cassandra & Solr >> >> >> >> >> >> -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr +
Ken Krugler 2013-01-07, 22:21
|