Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Can I write to an compressed file which is located in hdfs?


Copy link to this message
-
Re: Can I write to an compressed file which is located in hdfs?
Hi

Here is a piece of code that does the reverse of what you want; it takes a bunch of compressed files ( gzip, in this case ) and converts them to text.

You can tweak the code to do the reverse

http://pastebin.com/mBHVHtrm 

Raj

>________________________________
> From: Xiaobin She <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>Cc: [EMAIL PROTECTED]; David Sinclair <[EMAIL PROTECTED]>
>Sent: Tuesday, February 7, 2012 1:11 AM
>Subject: Re: Can I write to an compressed file which is located in hdfs?
>
>thank you Bejoy, I will look at that book.
>
>Thanks again!
>
>
>
>2012/2/7 <[EMAIL PROTECTED]>
>
>> **
>> Hi
>> AFAIK I don't think it is possible to append into a compressed file.
>>
>> If you have files in hdfs on a dir and you need to compress the same (like
>> files for an hour) you can use MapReduce to do that by setting
>> mapred.output.compress = true and
>> mapred.output.compression.codec='theCodecYouPrefer'
>> You'd get the blocks compressed in the output dir.
>>
>> You can use the API to read from standard input like
>> -get hadoop conf
>> -register the required compression codec
>> -write to CompressionOutputStream.
>>
>> You should get a well detailed explanation on the same from the book
>> 'Hadoop - The definitive guide' by Tom White.
>> Regards
>> Bejoy K S
>>
>> From handheld, Please excuse typos.
>> ------------------------------
>> *From: * Xiaobin She <[EMAIL PROTECTED]>
>> *Date: *Tue, 7 Feb 2012 14:24:01 +0800
>> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; David
>> Sinclair<[EMAIL PROTECTED]>
>> *Subject: *Re: Can I write to an compressed file which is located in hdfs?
>>
>> hi Bejoy and David,
>>
>> thank you for you help.
>>
>> So I can't directly write logs or append logs into an compressed file in
>> hdfs, right?
>>
>> Can I compress an file which is already in hdfs and has not been
>> compressed?
>>
>> If I can , how can I do that?
>>
>> Thanks!
>>
>>
>>
>> 2012/2/6 <[EMAIL PROTECTED]>
>>
>>> Hi
>>>       I agree with David on the point, you can achieve step 1 of my
>>> previous response with flume. ie load real time inflow of data in
>>> compressed format into hdfs. You can specify a time interval or data size
>>> in flume collector that determines when to flush data on to hdfs.
>>>
>>> Regards
>>> Bejoy K S
>>>
>>> From handheld, Please excuse typos.
>>>
>>> -----Original Message-----
>>> From: David Sinclair <[EMAIL PROTECTED]>
>>> Date: Mon, 6 Feb 2012 09:06:00
>>> To: <[EMAIL PROTECTED]>
>>> Cc: <[EMAIL PROTECTED]>
>>> Subject: Re: Can I write to an compressed file which is located in hdfs?
>>>
>>> Hi,
>>>
>>> You may want to have a look at the Flume project from Cloudera. I use it
>>> for writing data into HDFS.
>>>
>>> https://ccp.cloudera.com/display/SUPPORT/Downloads
>>>
>>> dave
>>>
>>> 2012/2/6 Xiaobin She <[EMAIL PROTECTED]>
>>>
>>> > hi Bejoy ,
>>> >
>>> > thank you for your reply.
>>> >
>>> > actually I have set up an test cluster which has one namenode/jobtracker
>>> > and two datanode/tasktracker, and I have make an test on this cluster.
>>> >
>>> > I fetch the log file of one of our modules from the log collector
>>> machines
>>> > by rsync, and then I use hive command line tool to load this log file
>>> into
>>> > the hive warehouse which  simply copy the file from the local
>>> filesystem to
>>> > hdfs.
>>> >
>>> > And I have run some analysis on these data with hive, all this run well.
>>> >
>>> > But now I want to avoid the fetch section which use rsync, and write the
>>> > logs into hdfs files directly from the servers which generate these
>>> logs.
>>> >
>>> > And it seems easy to do this job if the file locate in the hdfs is not
>>> > compressed.
>>> >
>>> > But how to write or append logs to an file that is compressed and
>>> located
>>> > in hdfs?
>>> >
>>> > Is this possible?
>>> >
>>> > Or is this an bad practice?
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >
>>> > 2012/2/6 <[EMAIL PROTECTED]>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB