Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> hadoop-consumer code in contrib package


+
navneet sharma 2013-01-14, 18:35
+
Felix GV 2013-01-14, 22:43
+
navneet sharma 2013-01-15, 17:06
+
Felix GV 2013-01-15, 18:17
+
navneet sharma 2013-01-17, 00:41
+
Jun Rao 2013-01-17, 05:12
Copy link to this message
-
Re: hadoop-consumer code in contrib package
That makes sense.

I tried an alternate approach- i am using high level consumer and going
through Hadoop HDFS APIs and pushing data in HDFS.

I am not creating any jobs for that.

The only problem i am seeing here is that the consumer is designed to run
forever. Which means i need to find out how to close the HDFS file and kill
consumer.

Is there any way to kill or close high level consumer gracefully?

I am running v0.7.0. I don't mind upgrading to higher version if that
allows me this kind of consumer handling.

Thanks,
Navneet
On Thu, Jan 17, 2013 at 10:41 AM, Jun Rao <[EMAIL PROTECTED]> wrote:

> I think the main reason for using SimpleConsumer is to manage offsets
> explicitly. For example, this is useful when Hadoop retries failed tasks.
> Another reason is that Hadoop already does load balancing. So, there is not
> much need to balance the load again using the high level consumer.
>
> Thanks,
>
> Jun
>
> On Wed, Jan 16, 2013 at 4:40 PM, navneet sharma <
> [EMAIL PROTECTED]
> > wrote:
>
> > Thanks Felix.
> >
> > One question still remains. Why SimpleConsumer?
> > Why not high level Consumer? If i change the code to high level consumer,
> > will it create any challenges?
> >
> >
> > Navneet
> >
> >
> > On Tue, Jan 15, 2013 at 11:46 PM, Felix GV <[EMAIL PROTECTED]> wrote:
> >
> > > Please read the Kafka design paper <
> http://kafka.apache.org/design.html
> > >.
> > >
> > > It may look a little long, but it's as short as it can be. Kafka
> differs
> > > from other messaging system in a couple of ways, and it's important to
> > > understand the fundamental design choices that were made in order to
> > > understand the way Kafka works.
> > >
> > > I believe my previous email already answers both your offset tracking
> and
> > > retention questions, but if my explanation are not clear enough, then
> the
> > > next best thing is probably to read the design paper :)
> > >
> > > --
> > > Felix
> > >
> > >
> > > On Tue, Jan 15, 2013 at 12:01 PM, navneet sharma <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Thanks Felix for sharing your work. Contrib hadoop-consumer looks
> like
> > > the
> > > > same way.
> > > >
> > > > I think i need to really understand this offset stuff. So far i have
> > used
> > > > only high level consumer.When consumer is done reading all the
> > messages,
> > > i
> > > > used to kill the process(because it won't on its own).
> > > >
> > > > Again i used Producer to pump more messages and Consumer to read the
> > new
> > > > messages(which is a new process as i killed the last consumer).
> > > >
> > > > But i never saw messages getting duplicating.
> > > >
> > > > Now its not very clear for me that how offsets is tracked
> specifically
> > > when
> > > > i am re-launching the consumer?
> > > > And why retention policy is not working when used with
> SimpleConsumer?
> > > For
> > > > my experiment i made it 4 hours.
> > > >
> > > > Please help me understand.
> > > >
> > > > Thanks,
> > > > Navneet
> > > >
> > > >
> > > > On Tue, Jan 15, 2013 at 4:12 AM, Felix GV <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > I think you may be misunderstanding the way Kafka works.
> > > > >
> > > > > A kafka broker is never supposed to clear messages just because a
> > > > consumer
> > > > > read them.
> > > > >
> > > > > The kafka broker will instead clear messages after their retention
> > > period
> > > > > ends, though it will not delete the messages at the exact time when
> > > they
> > > > > expire. Instead, a background process will periodically delete a
> > batch
> > > of
> > > > > expired messages. The retention policies guarantee a minimum
> > retention
> > > > > time, not an exact retention time.
> > > > >
> > > > > It is the responsibility of each consumer to keep track of which
> > > messages
> > > > > they have consumed already (by recording an offset for each
> consumed
> > > > > partition). The high-level consumer stores these offsets in ZK. The
> > > > simple
> > > > > consumer has no built-in capability to store and manage offsets, so

 
+
Jun Rao 2013-01-17, 15:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB