Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Upload, then decompress archive on HDFS?


Copy link to this message
-
Re: Upload, then decompress archive on HDFS?
I can envision an M/R job for the purpose of manipulating hdfs, such as (de)compressing files and resaving them back to HDFS.  I just didn't think it should be necessary to *write a program* to do something so seemingly minimal.  This (tarring/compressing/etc.) seems like an obvious method for moving data back and forth; I would expect the tools to support it.

I'll read up on "-text".  Maybe that really is what I wanted, although I'm dubious since this has nothing to do with textual data at all.  Anyway, I'll see what I can find on that.

Thanks.

On Aug 4, 2011, at 9:04 PM, Harsh J wrote:

> Keith,
>
> The 'hadoop fs -text' tool does decompress a file given to it if
> needed/able, but what you could also do is run a distributed mapreduce
> job that converts from compressed to decompressed, that'd be much
> faster.
>
> On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
>> Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on a gzipped (or zipped) archive, one file, much smaller total megs.  Then I want to decompress the archive on HDFS?  I can't figure out what "hd fs" type command would do such a thing.
>>
>> Thanks.
________________________________________________________________________________
Keith Wiley     [EMAIL PROTECTED]     keithwiley.com    music.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
                                           --  Keith Wiley
________________________________________________________________________________
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB