Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: produce a large sequencefile (1TB)


Copy link to this message
-
Re: produce a large sequencefile (1TB)
Bing Jiang 2013-08-20, 02:55
Hi Jerry,

I think whether it is acceptable to set multiple reducers to generate more
MapFile(IndexFile, DataFile)s.

I want to know the real difficulties of multiply reducer to
post-processing. Maybe there are some questions about app?

2013/8/20 Jerry Lam <[EMAIL PROTECTED]>

> Hi Bing,
>
> you are correct. The local storage does not have enough capacity to hold
> the temporary files generated by the mappers. Since we want a single
> sequence file at the end, we are forced to use 1 reducer.
>
> The use case is that we want to generate an index for the 1TB sequence
> file that we can randomly access each row in the sequence file. In
> practice, this is simply a MapFile.
>
> Any idea how to resolve this dilemma is greatly appreciated.
>
> Jerry
>
>
>
> On Mon, Aug 19, 2013 at 8:14 PM, Bing Jiang <[EMAIL PROTECTED]>wrote:
>
>> hi,Jerry.
>> I think you are worrying about the volumn of mapreduce local file, but
>> would  you give us more details about your apps.
>>  On Aug 20, 2013 6:09 AM, "Jerry Lam" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Hadoop users and developers,
>>>
>>> I have a use case that I need produce a large sequence file of 1 TB in
>>> size when each datanode has  200GB of storage but I have 30 datanodes.
>>>
>>> The problem is that no single reducer can hold 1TB of data during the
>>> reduce phase to generate a single sequence file even I use aggressive
>>> compression. Any datanode will run out of space since this is a single
>>> reducer job.
>>>
>>> Any comment and help is appreciated.
>>>
>>> Jerry
>>>
>>
>
--
Bing Jiang
Tel:(86)134-2619-1361
weibo: http://weibo.com/jiangbinglover
BLOG: www.binospace.com
BLOG: http://blog.sina.com.cn/jiangbinglover
Focus on distributed computing, HDFS/HBase