Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: produce a large sequencefile (1TB)

Copy link to this message
Re: produce a large sequencefile (1TB)
Hi Jerry,

I think whether it is acceptable to set multiple reducers to generate more
MapFile(IndexFile, DataFile)s.

I want to know the real difficulties of multiply reducer to
post-processing. Maybe there are some questions about app?

2013/8/20 Jerry Lam <[EMAIL PROTECTED]>

> Hi Bing,
> you are correct. The local storage does not have enough capacity to hold
> the temporary files generated by the mappers. Since we want a single
> sequence file at the end, we are forced to use 1 reducer.
> The use case is that we want to generate an index for the 1TB sequence
> file that we can randomly access each row in the sequence file. In
> practice, this is simply a MapFile.
> Any idea how to resolve this dilemma is greatly appreciated.
> Jerry
> On Mon, Aug 19, 2013 at 8:14 PM, Bing Jiang <[EMAIL PROTECTED]>wrote:
>> hi,Jerry.
>> I think you are worrying about the volumn of mapreduce local file, but
>> would  you give us more details about your apps.
>>  On Aug 20, 2013 6:09 AM, "Jerry Lam" <[EMAIL PROTECTED]> wrote:
>>> Hi Hadoop users and developers,
>>> I have a use case that I need produce a large sequence file of 1 TB in
>>> size when each datanode has  200GB of storage but I have 30 datanodes.
>>> The problem is that no single reducer can hold 1TB of data during the
>>> reduce phase to generate a single sequence file even I use aggressive
>>> compression. Any datanode will run out of space since this is a single
>>> reducer job.
>>> Any comment and help is appreciated.
>>> Jerry
Bing Jiang
weibo: http://weibo.com/jiangbinglover
BLOG: www.binospace.com
BLOG: http://blog.sina.com.cn/jiangbinglover
Focus on distributed computing, HDFS/HBase