Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - hadoop streaming and a directory containing large number of .tgz files


+
Sunil S Nandihalli 2012-04-24, 13:42
Copy link to this message
-
RE: hadoop streaming and a directory containing large number of .tgz files
Devaraj k 2012-04-24, 17:37
Hi Sunil,

    Please check HarFileSystem (Hadoop Archive Filesystem), it will be useful to solve your problem.

Thanks
Devaraj
________________________________________
From: Sunil S Nandihalli [[EMAIL PROTECTED]]
Sent: Tuesday, April 24, 2012 7:12 PM
To: [EMAIL PROTECTED]
Subject: hadoop streaming and a directory containing large number of .tgz files

Hi Everybody,
 I am a newbie to hadoop. I have about 40K .tgz files each of approximately
3MB . I would like to process this as if it were a single large file formed
by
"cat list-of-files | gnuparallel 'tar -Oxvf {} | sed 1d' > output.txt"
how can I achieve this using hadoop-streaming or some-other similar
library..
thanks,
Sunil.
+
Sunil S Nandihalli 2012-04-24, 14:01
+
Raj Vishwanathan 2012-04-24, 14:29