Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - hadoop-consumer never finishing


Copy link to this message
-
Re: hadoop-consumer never finishing
Felix GV 2011-11-07, 16:41
I think I've had the same bug. It's a known issue that is fixed in the
trunk.

You should check out Kafka from the (Apache) trunk and use the hadoop
consumer provided there in the contrib directory. If I'm not mistaken, that
version is more up to date than the one you mentioned on github...

--
Felix

On Monday, November 7, 2011, Raimon Bosch <[EMAIL PROTECTED]> wrote:
> Problem solved! It was a configuration issue.
>
> Trying with:
> event.count=1000
> kafka.request.limit=1000
>
> The mapper has stopped and it has generated a file with 1000 events. But
If
> we use kafka.request.limit=-1 is sending the same events over and over
> again that's why my hadoop-consumer couldn't stop.
>
> 2011/11/7 Raimon Bosch <[EMAIL PROTECTED]>
>
>>
>> Hi,
>>
>> I have just compiled kafka from https://github.com/kafka-dev/kafka and
>> executed the DataGenerator:
>>
>> ./run-class.sh kafka.etl.impl.DataGenerator test/test.properties
>>
>> After that I have executed the hadoop consumer:
>>
>> ./run-class.sh kafka.etl.impl.SimpleKafkaETLJob test/test.properties
>>
>>
>> The hadoop-consumer is generating a file on the specified output but it
is
>> never finishing, even if I try to generate only 1 event
>> at test/test.properties. So this file is growing and growing, my guessing
>> is that maybe it is reading always the offset 0?
>>
>> That is my test.properties:
>>
>> # name of test topic
>> kafka.etl.topic=SimpleTestEvent5
>>
>> # hdfs location of jars
>> hdfs.default.classpath.dir=/tmp/kafka/lib
>>
>> # number of test events to be generated
>> event.count=1
>>
>> # hadoop id and group
>> hadoop.job.ugi=kafka,hadoop
>>
>> # kafka server uri
>> kafka.server.uri=tcp://localhost:9092
>>
>> # hdfs location of input directory
>> input=/tmp/kafka/data
>>
>> # hdfs location of output directory
>> output=/tmp/kafka/output
>>
>> # limit the number of events to be fetched;
>> # value -1 means no limitation
>> kafka.request.limit=-1
>>
>> # kafka parameters
>> client.buffer.size=1048576
>> client.so.timeout=60000
>>
>>
>> Any ideas where can I have the problem?
>>
>

--
--
Felix