|
|
-
How to make zip files as Hadoop input
Mark Kerzner 2011-03-02, 05:45
Hi,
I have a bunch of zip files that I want to serve as input to a MapReduce job. My initial design was to list them in a text file and then give this list file as input. The list file would be read, and each line would be handed off to a node to process, which would pick up the corresponding zip file and work on it.
But I feel that a better design is possible, and that my way is redundant. Can I just give the input directory as input? How do I make sure each node gets a file to process?
Thank you, Mark
-
Re: How to make zip files as Hadoop input
Nitin Khandelwal 2011-03-02, 05:52
Hi, You can actually make your own input format and reader which will read one file from a directory and give it to a node. If You are using hadoop 0.19 then extending MultiFilesplit format can do this task for you . If you are using Hadoop 0.20 or greater then your your inputformat can extend fileInputformat and yor reader can extend recordreader. Thanks and Regards, Nitin
On 2 March 2011 11:15, Mark Kerzner <[EMAIL PROTECTED]> wrote:
> Hi, > > I have a bunch of zip files that I want to serve as input to a MapReduce > job. My initial design was to list them in a text file and then give this > list file as input. The list file would be read, and each line would be > handed off to a node to process, which would pick up the corresponding zip > file and work on it. > > But I feel that a better design is possible, and that my way is redundant. > Can I just give the input directory as input? How do I make sure each node > gets a file to process? > > Thank you, > Mark >
-- Nitin Khandelwal
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext