I am looking for a product (open source or not), something like Talend or Pentaho that in which I can design the ETL (from and to kafka), and run the the ETL in Storm/ IronCount or even maybe I can run it in Hadoop Map/Reduce. The product should be complete and supports many connections to many data sources and targets, In that sense if you know of a connection to Talend or Pentaho it will be great.
Thanks again. , On 01/07/2013 12:28 AM, David Arthur wrote:
On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote: Interesting - we build ETLs on top of Hadoop using Cascading (open source workflow API), which has a lot of what it calls "Taps" for connecting to data sources and sinks.
But I haven't heard of a Kafka Tap. Should be possible to implement, though.
One issue is that Hadoop is batch oriented, so there's a bit of an impedance mismatch when you've got a streaming data source, but from experience it's possible to get that to work.
Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans Hadoop records, which may be ETL'd first, and emits new Kafka events. On Mon, Jan 7, 2013 at 9:57 AM, Ken Krugler <[EMAIL PROTECTED]>wrote:
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
I previously posted a link to contrib in this thread. No, its not a cascading tap. Its a complete job. One to read kafka events to hdfs, one to generate kafka events from hdfs. ETL can happen in between. On Jan 7, 2013 1:51 PM, "Ken Krugler" <[EMAIL PROTECTED]> wrote:
On Jan 7, 2013, at 2:05pm, Russell Jurney wrote: Thanks, I missed that - all I saw was the long URL to the Talend integration doc on Hortonworks. Some Cascading integration notes, just for posterity:
Having a Kafka Tap/Scheme would make integration easy. I see there are KafkaInputFormat and KafkaOutputFormat classes in the contrib, which is great - though these would have to back-port these to the older Hadoop APIs in order to work with Cascading. Also Cascading sends all data around as the key (value is always NullWritable) whereas the Kafka input/output formats do the opposite.
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext