Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - How to use the hadoop consumer in distributed mode?


Copy link to this message
-
Re: How to use the hadoop consumer in distributed mode?
Richard Park 2011-10-26, 20:33
Jay and I had a talk about this and we would like to release it as soon as
we can. There are a few LinkedIn specific coding that need to be abstracted
out first.

We also use Avro heavily, and so much of our code is written with that in
mind. It should be easy enough to abstract the Avro out, but we may release
that part of the code as is.

Anyways, we're evaluating what can be released and what needs to be
cleaned-up but we hope to get something out there soon.

On Wed, Oct 26, 2011 at 1:10 PM, Felix GV <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I wanted to give a little update on this topic.
>
> I was able to make hadoop-consumer work with a kafka cluster.
>
> What I did is:
>
>   1. I generated a .properties file for one of the kafka brokers I wanted
>   to connect to.
>   2. I ran the DataGenerator program by passing the .properties file as a
>   parameter.
>   3. I moved the 1.dat offset file generated in HDFS so that it has another
>   name (so that it's not overwritten the next time I run the
> DataGenerator).
>   4. I changed the the broker's address in the .properties file to the next
>   server I wanted to connect to.
>   5. I repeated step 2 to 4 for every kafka server in the cluster.
>   6. I then ran SimpleKafkaETLJob and it was able to spawn one map task per
>   broker and pull all the data from each.
>
> This is almost exactly what I was trying before, except that before, I had
> manually modified the .dat offset files instead of generating each one with
> the DataGenerator, and I think vim didn't play nice with the SEQ files or
> something like that... I don't know.
>
> Anyhow, what I'm doing now is a little convoluted but at least it works...
> I
> will create a script that does all this repetitive stuff for me. Ideally, I
> would also like to pull the brokers list from ZK, like you guys do.
>
> The Kafka/Hadoop ETL tools you mentioned are no doubt more mature and
> complete than the stuff I will create, so it would be really nice if you
> could release it.
>
> I think releasing those tools would help drive the adoption of Kafka,
> because in the state it's in now, Kafka is not really plug and play. That
> is, it works (which is already better than a lot of open source projects
> out
> there ;) !) but it seems a rather important part is missing.
>
> --
> Felix
>
>
>
> On Tue, Oct 18, 2011 at 7:31 PM, Hisham Mardam-Bey <[EMAIL PROTECTED]
> >wrote:
>
> > Hi folks, been following this thread, Felix and I are working together
> > on this project, we really like Kafka and are moving it into
> > production very soon.
> >
> > Jay, question, would you guys consider releasing the code in a "not so
> > clean state" and have the community (we would like to help) shore it
> > up so it becomes usable by the masses or are there other issues
> > (legal?) you have to sort out first?
> >
> > Thanks!
> >
> > hisham.
> >
> > On Tue, Oct 18, 2011 at 6:28 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> > > I would actually love for us to release the full ETL system we have for
> > > Kafka/Hadoop, it is just a matter of finding the time to get this code
> > into
> > > that shape.
> > >
> > > The hadoop team that maintains that code is pretty busy right now, but
> i
> > am
> > > hoping we can find a way.
> > >
> > > -Jay
> > >
> > > On Tue, Oct 18, 2011 at 3:18 PM, Felix Giguere Villegas <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > >> Thanks for your replies guys :)
> > >>
> > >> @Jay: I thought about the Hadoop version mismatch too, because I've
> had
> > the
> > >> same problem before. I'll double check again to make sure I have the
> > same
> > >> versions of hadoop everywhere, as the Kafka distributed cluster I was
> > >> testing on is a new setup and I might have forgotten to put the hadoop
> > jars
> > >> we use in it... I'm working part-time for now so I probably won't
> touch
> > >> this
> > >> again until next week but I'll keep you guys posted ASAP :)
> > >>
> > >> @Richard: Thanks a lot for your description. That clears out the