Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> How to use the hadoop consumer in distributed mode?


Copy link to this message
-
Re: How to use the hadoop consumer in distributed mode?
Awesome :) !

Do you have any timeline as to when you think you could release this?

This is not to rush you, but we are going to switch some of our priorities
around based on whether the answer is next week or next month of 3 months
from now or more ;)

Thanks for your great work! I think I speak for everyone when I say it's
really appreciated :)

--
Felix

On Wed, Oct 26, 2011 at 4:33 PM, Richard Park <[EMAIL PROTECTED]>wrote:

> Jay and I had a talk about this and we would like to release it as soon as
> we can. There are a few LinkedIn specific coding that need to be abstracted
> out first.
>
> We also use Avro heavily, and so much of our code is written with that in
> mind. It should be easy enough to abstract the Avro out, but we may release
> that part of the code as is.
>
> Anyways, we're evaluating what can be released and what needs to be
> cleaned-up but we hope to get something out there soon.
>
> On Wed, Oct 26, 2011 at 1:10 PM, Felix GV <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I wanted to give a little update on this topic.
> >
> > I was able to make hadoop-consumer work with a kafka cluster.
> >
> > What I did is:
> >
> >   1. I generated a .properties file for one of the kafka brokers I wanted
> >   to connect to.
> >   2. I ran the DataGenerator program by passing the .properties file as a
> >   parameter.
> >   3. I moved the 1.dat offset file generated in HDFS so that it has
> another
> >   name (so that it's not overwritten the next time I run the
> > DataGenerator).
> >   4. I changed the the broker's address in the .properties file to the
> next
> >   server I wanted to connect to.
> >   5. I repeated step 2 to 4 for every kafka server in the cluster.
> >   6. I then ran SimpleKafkaETLJob and it was able to spawn one map task
> per
> >   broker and pull all the data from each.
> >
> > This is almost exactly what I was trying before, except that before, I
> had
> > manually modified the .dat offset files instead of generating each one
> with
> > the DataGenerator, and I think vim didn't play nice with the SEQ files or
> > something like that... I don't know.
> >
> > Anyhow, what I'm doing now is a little convoluted but at least it
> works...
> > I
> > will create a script that does all this repetitive stuff for me. Ideally,
> I
> > would also like to pull the brokers list from ZK, like you guys do.
> >
> > The Kafka/Hadoop ETL tools you mentioned are no doubt more mature and
> > complete than the stuff I will create, so it would be really nice if you
> > could release it.
> >
> > I think releasing those tools would help drive the adoption of Kafka,
> > because in the state it's in now, Kafka is not really plug and play. That
> > is, it works (which is already better than a lot of open source projects
> > out
> > there ;) !) but it seems a rather important part is missing.
> >
> > --
> > Felix
> >
> >
> >
> > On Tue, Oct 18, 2011 at 7:31 PM, Hisham Mardam-Bey <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi folks, been following this thread, Felix and I are working together
> > > on this project, we really like Kafka and are moving it into
> > > production very soon.
> > >
> > > Jay, question, would you guys consider releasing the code in a "not so
> > > clean state" and have the community (we would like to help) shore it
> > > up so it becomes usable by the masses or are there other issues
> > > (legal?) you have to sort out first?
> > >
> > > Thanks!
> > >
> > > hisham.
> > >
> > > On Tue, Oct 18, 2011 at 6:28 PM, Jay Kreps <[EMAIL PROTECTED]>
> wrote:
> > > > I would actually love for us to release the full ETL system we have
> for
> > > > Kafka/Hadoop, it is just a matter of finding the time to get this
> code
> > > into
> > > > that shape.
> > > >
> > > > The hadoop team that maintains that code is pretty busy right now,
> but
> > i
> > > am
> > > > hoping we can find a way.
> > > >
> > > > -Jay
> > > >
> > > > On Tue, Oct 18, 2011 at 3:18 PM, Felix Giguere Villegas <
> > > > [EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB