Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Can I write to an compressed file which is located in hdfs?


Copy link to this message
-
Re: Can I write to an compressed file which is located in hdfs?
Raj Vishwanathan 2012-02-07, 18:06
Hi

Here is a piece of code that does the reverse of what you want; it takes a bunch of compressed files ( gzip, in this case ) and converts them to text.

You can tweak the code to do the reverse

http://pastebin.com/mBHVHtrm 

Raj

>________________________________
> From: Xiaobin She <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>Cc: [EMAIL PROTECTED]; David Sinclair <[EMAIL PROTECTED]>
>Sent: Tuesday, February 7, 2012 1:11 AM
>Subject: Re: Can I write to an compressed file which is located in hdfs?
>
>thank you Bejoy, I will look at that book.
>
>Thanks again!
>
>
>
>2012/2/7 <[EMAIL PROTECTED]>
>
>> **
>> Hi
>> AFAIK I don't think it is possible to append into a compressed file.
>>
>> If you have files in hdfs on a dir and you need to compress the same (like
>> files for an hour) you can use MapReduce to do that by setting
>> mapred.output.compress = true and
>> mapred.output.compression.codec='theCodecYouPrefer'
>> You'd get the blocks compressed in the output dir.
>>
>> You can use the API to read from standard input like
>> -get hadoop conf
>> -register the required compression codec
>> -write to CompressionOutputStream.
>>
>> You should get a well detailed explanation on the same from the book
>> 'Hadoop - The definitive guide' by Tom White.
>> Regards
>> Bejoy K S
>>
>> From handheld, Please excuse typos.
>> ------------------------------
>> *From: * Xiaobin She <[EMAIL PROTECTED]>
>> *Date: *Tue, 7 Feb 2012 14:24:01 +0800
>> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; David
>> Sinclair<[EMAIL PROTECTED]>
>> *Subject: *Re: Can I write to an compressed file which is located in hdfs?
>>
>> hi Bejoy and David,
>>
>> thank you for you help.
>>
>> So I can't directly write logs or append logs into an compressed file in
>> hdfs, right?
>>
>> Can I compress an file which is already in hdfs and has not been
>> compressed?
>>
>> If I can , how can I do that?
>>
>> Thanks!
>>
>>
>>
>> 2012/2/6 <[EMAIL PROTECTED]>
>>
>>> Hi
>>>       I agree with David on the point, you can achieve step 1 of my
>>> previous response with flume. ie load real time inflow of data in
>>> compressed format into hdfs. You can specify a time interval or data size
>>> in flume collector that determines when to flush data on to hdfs.
>>>
>>> Regards
>>> Bejoy K S
>>>
>>> From handheld, Please excuse typos.
>>>
>>> -----Original Message-----
>>> From: David Sinclair <[EMAIL PROTECTED]>
>>> Date: Mon, 6 Feb 2012 09:06:00
>>> To: <[EMAIL PROTECTED]>
>>> Cc: <[EMAIL PROTECTED]>
>>> Subject: Re: Can I write to an compressed file which is located in hdfs?
>>>
>>> Hi,
>>>
>>> You may want to have a look at the Flume project from Cloudera. I use it
>>> for writing data into HDFS.
>>>
>>> https://ccp.cloudera.com/display/SUPPORT/Downloads
>>>
>>> dave
>>>
>>> 2012/2/6 Xiaobin She <[EMAIL PROTECTED]>
>>>
>>> > hi Bejoy ,
>>> >
>>> > thank you for your reply.
>>> >
>>> > actually I have set up an test cluster which has one namenode/jobtracker
>>> > and two datanode/tasktracker, and I have make an test on this cluster.
>>> >
>>> > I fetch the log file of one of our modules from the log collector
>>> machines
>>> > by rsync, and then I use hive command line tool to load this log file
>>> into
>>> > the hive warehouse which  simply copy the file from the local
>>> filesystem to
>>> > hdfs.
>>> >
>>> > And I have run some analysis on these data with hive, all this run well.
>>> >
>>> > But now I want to avoid the fetch section which use rsync, and write the
>>> > logs into hdfs files directly from the servers which generate these
>>> logs.
>>> >
>>> > And it seems easy to do this job if the file locate in the hdfs is not
>>> > compressed.
>>> >
>>> > But how to write or append logs to an file that is compressed and
>>> located
>>> > in hdfs?
>>> >
>>> > Is this possible?
>>> >
>>> > Or is this an bad practice?
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >
>>> > 2012/2/6 <[EMAIL PROTECTED]>