Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Question about gzip compression when using Flume Ng


Copy link to this message
-
Re: Question about gzip compression when using Flume Ng
Is there any incompatibility in trying to write to a different version of
Hadoop then?

- Connor
On Mon, Jan 14, 2013 at 5:25 PM, Bhaskar V. Karambelkar <[EMAIL PROTECTED]
> wrote:

> Sagar,
> You're better of downloading and unzipping CDH3u5 or CDH4 some where, and
> pointing the HADOOP_HOME env. variable to the base directory.
> That way you won't have to worry about which jar files are needed and
> which not.
> Flume will auto add all JARs from the Hadoop Installation that it needs.
>
> regards
> Bhaskar
>
>
> On Mon, Jan 14, 2013 at 7:43 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
>
>> ok so I dropped in the new hadoop-core jar in /opt/flume/lib [I got some
>> errors about the guava dependencies so put in that jar too]
>>
>> smehta@collector102:/opt/flume/lib$ ls -ltrh | grep -e "hadoop-core" -e
>> "guava"
>> -rw-r--r-- 1 hadoop hadoop 1.5M 2012-11-14 21:49 guava-10.0.1.jar
>> -rw-r--r-- 1 hadoop hadoop 3.7M 2013-01-14 23:50
>> hadoop-core-0.20.2-cdh3u5.jar
>>
>> Now I don't event see the file being created in hdfs and the flume log is
>> happily talking about housekeeping for some file channel checkpoints,
>> updating pointers et al
>>
>> Below is tail of flume log
>>
>> *hadoop@collector102:/data/flume_log$ tail -10 flume.log*
>> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
>>  org.apache.flume.channel.file.Log - Updated checkpoint for file:
>> /data/flume_data/channel2/data/log-36 position: 129415524 logWriteOrderID:
>> 1358209947324
>> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
>>  org.apache.flume.channel.file.LogFile - Closing RandomReader
>> /data/flume_data/channel2/data/log-34
>> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
>>  org.apache.flume.channel.file.Log - Updated checkpoint for file:
>> /data/flume_data/channel1/data/log-36 position: 129415524 logWriteOrderID:
>> 1358209947323
>> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
>>  org.apache.flume.channel.file.LogFile - Closing RandomReader
>> /data/flume_data/channel1/data/log-34
>> 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel2] INFO
>>  org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
>> currentPosition = 18577138, logWriteOrderID = 1358209947324
>> 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel1] INFO
>>  org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
>> currentPosition = 18577138, logWriteOrderID = 1358209947323
>> 2013-01-15 00:42:10,820 [Log-BackgroundWorker-channel1] INFO
>>  org.apache.flume.channel.file.LogFile - Closing RandomReader
>> /data/flume_data/channel1/data/log-35
>> 2013-01-15 00:42:10,821 [Log-BackgroundWorker-channel2] INFO
>>  org.apache.flume.channel.file.LogFile - Closing RandomReader
>> /data/flume_data/channel2/data/log-35
>> 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel1] INFO
>>  org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
>> currentPosition = 217919486, logWriteOrderID = 1358209947323
>> 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel2] INFO
>>  org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
>> currentPosition = 217919486, logWriteOrderID = 1358209947324
>>
>> Sagar
>>
>>
>> On Mon, Jan 14, 2013 at 3:38 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>>
>>> Hmm, could you try and updated version of Hadoop? CDH3u2 is quite old,
>>> I would upgrade to CDH3u5 or CDH 4.1.2.
>>>
>>> On Mon, Jan 14, 2013 at 3:27 PM, Sagar Mehta <[EMAIL PROTECTED]>
>>> wrote:
>>> > About the bz2 suggestion, we have a ton of downstream jobs that assume
>>> gzip
>>> > compressed files - so it is better to stick to gzip.
>>> >
>>> > The plan B for us is to have a Oozie step to gzip compress the logs
>>> before
>>> > proceeding with downstream Hadoop jobs - but that looks like a hack to
>>> me!!
>>> >
>>> > Sagar
>>> >
>>> >
>>> > On Mon, Jan 14, 2013 at 3:24 PM, Sagar Mehta <[EMAIL PROTECTED]>
>>> wrote:
>>> >>
>>> >> hadoop@jobtracker301:/home/hadoop/sagar/debug$ zcat
>>> >> collector102.ngpipes.sac.ngmoco.com.1358204406896.gz | wc -l