-Re: produce a large sequencefile (1TB)
Bing Jiang 2013-08-20, 02:55
I think whether it is acceptable to set multiple reducers to generate more
I want to know the real difficulties of multiply reducer to
post-processing. Maybe there are some questions about app?
2013/8/20 Jerry Lam <[EMAIL PROTECTED]>
> Hi Bing,
> you are correct. The local storage does not have enough capacity to hold
> the temporary files generated by the mappers. Since we want a single
> sequence file at the end, we are forced to use 1 reducer.
> The use case is that we want to generate an index for the 1TB sequence
> file that we can randomly access each row in the sequence file. In
> practice, this is simply a MapFile.
> Any idea how to resolve this dilemma is greatly appreciated.
> On Mon, Aug 19, 2013 at 8:14 PM, Bing Jiang <[EMAIL PROTECTED]>wrote:
>> I think you are worrying about the volumn of mapreduce local file, but
>> would you give us more details about your apps.
>> On Aug 20, 2013 6:09 AM, "Jerry Lam" <[EMAIL PROTECTED]> wrote:
>>> Hi Hadoop users and developers,
>>> I have a use case that I need produce a large sequence file of 1 TB in
>>> size when each datanode has 200GB of storage but I have 30 datanodes.
>>> The problem is that no single reducer can hold 1TB of data during the
>>> reduce phase to generate a single sequence file even I use aggressive
>>> compression. Any datanode will run out of space since this is a single
>>> reducer job.
>>> Any comment and help is appreciated.
Focus on distributed computing, HDFS/HBase