Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Re: Kafka/Hadoop consumers and producers


Copy link to this message
-
Re: Kafka/Hadoop consumers and producers
I think the answer is that there is currently no strong community-backed
solution to consume non-Avro data from Kafka to HDFS.

A lot of people do it, but I think most people adapted and expanded the
contrib code to fit their needs.

--
Felix
On Fri, Aug 9, 2013 at 1:27 PM, Oleg Ruchovets <[EMAIL PROTECTED]> wrote:

> Yes , I am definitely interested with such capabilities. We also using
> kafka 0.7.
>    Guys I already asked , but nobody answer: what community using to
> consume from kafka to hdfs?
> My assumption was that if Camus support only Avro it will not be suitable
> for all , but people transfer from kafka to hadoop somehow. So the question
> is what is the alternatives to Camus to transfer messages from kafka to
> hdfs?
> Thanks
> Oleg.
>
>
> On Fri, Aug 9, 2013 at 6:21 AM, Andrew Psaltis <[EMAIL PROTECTED]
> >wrote:
>
> > Felix,
> > The Camus route is the direction I have headed for allot of the reasons
> > that you described. The only wrinkle is we are still on Kafka 0.7.3 so I
> am
> > in the process of back porting this patch:
> >
> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8that
> > is described here:
> > https://groups.google.com/forum/#!topic/camus_etl/VcETxkYhzg8 -- so that
> > we can handle reading and writing non-avro'ized (if that is a word) data.
> >
> > I hope to have that done sometime in the morning and would be happy to
> > share it if others can benefit from it.
> >
> > Thanks,
> > Andrew
> >
> >
> > On Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote:
> >
> >> The contrib code is simple and probably wouldn't require too much work
> to
> >> fix, but it's a lot less robust than Camus, so you would ideally need
> to do
> >> some work to make it solid against all edge cases, failure scenarios and
> >> performance bottlenecks...
> >>
> >> I would definitely recommend investing in Camus instead, since it
> already
> >> covers a lot of the challenges I'm mentioning above, and also has more
> >> community support behind it at the moment (as far as I can tell,
> anyway),
> >> so it is more likely to keep getting improvements than the contrib code.
> >>
> >> --
> >> Felix
> >>
> >>
> >> On Thu, Aug 8, 2013 at 9:28 AM, <[EMAIL PROTECTED]> wrote:
> >>
> >>> We also have a need today to ETL from Kafka into Hadoop and we do not
> >>> currently nor have any plans to use Avro.
> >>>
> >>> So is the official direction based on this discussion to ditch the
> Kafka
> >>> contrib code and direct people to use Camus without Avro as Ken
> described
> >>> or are both solutions going to survive?
> >>>
> >>> I can put time into the contrib code and/or work on documenting the
> >>> tutorial on how to make Camus work without Avro.
> >>>
> >>> Which is the preferred route, for the long term?
> >>>
> >>> Thanks,
> >>> Andrew
> >>>
> >>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
> >>> > Hi Andrew,
> >>> >
> >>> >
> >>> >
> >>> > Camus can be made to work without avro. You will need to implement a
> >>> message decoder and and a data writer.   We need to add a better
> tutorial
> >>> on how to do this, but it isn't that difficult. If you decide to go
> down
> >>> this path, you can always ask questions on this list. I try to make
> sure
> >>> each email gets answered. But it can take me a day or two.
> >>> >
> >>> >
> >>> >
> >>> > -Ken
> >>> >
> >>> >
> >>> >
> >>> > On Aug 7, 2013, at 9:33 AM, [EMAIL PROTECTED] wrote:
> >>> >
> >>> >
> >>> >
> >>> > > Hi all,
> >>> >
> >>> > >
> >>> >
> >>> > > Over at the Wikimedia Foundation, we're trying to figure out the
> >>> best way to do our ETL from Kafka into Hadoop.  We don't currently use
> Avro
> >>> and I'm not sure if we are going to.  I came across this post.
> >>> >
> >>> > >
> >>> >
> >>> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do
> >>> you think we should not consider it as one of our viable options?
> >>> >
> >>> > >
> >>> >
> >>> > > Thanks!
> >>> >
> >>>