Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: partition as block?


+
Jay Vyas 2013-04-30, 19:04
Copy link to this message
-
Re: partition as block?
Mohammad Tariq 2013-04-30, 19:38
Hmmm. I was actually thinking about the very first step. How are you going
to create the maps. Suppose you are on a block-less filesystem and you have
a custom Format that is going to give you the splits dynamically. This mean
that you are going to store the file as a whole and create the splits as
you continue to read the file. Wouldn't it be a bottleneck from 'disk'
point of view??Are you not going away from the distributed paradigm??

Am I taking it in the correct way. Please correct me if I am getting it
wrong.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 12:34 AM, Jay Vyas <[EMAIL PROTECTED]> wrote:

> Well, to be more clear, I'm wondering how hadoop-mapreduce can be
> optimized in a block-less filesystem... And am thinking about application
> tier ways to simulate blocks - i.e. by making the granularity of partitions
> smaller.
>
> Wondering, if there is a way to hack an increased numbers of partitions as
> a mechanism to simulate blocks - or wether this is just a bad idea
> altogether :)
>
>
>
>
> On Tue, Apr 30, 2013 at 2:56 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>
>> Hello Jay,
>>
>>     What are you going to do in your custom InputFormat and
>> partitioner?Is your InputFormat is going to create larger splits which will
>> overlap with larger blocks?If that is the case, IMHO, then you are going to
>> reduce the no. of mappers thus reducing the parallelism. Also, much larger
>> block size will put extra overhead when it comes to disk I/O.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Wed, May 1, 2013 at 12:16 AM, Jay Vyas <[EMAIL PROTECTED]> wrote:
>>
>>> Hi guys:
>>>
>>> Im wondering - if I'm running mapreduce jobs on a cluster with large
>>> block sizes - can i increase performance with either:
>>>
>>> 1) A custom FileInputFormat
>>>
>>>  2) A custom partitioner
>>>
>>> 3) -DnumReducers
>>>
>>> Clearly, (3) will be an issue due to the fact that it might overload
>>> tasks and network traffic... but maybe (1) or (2) will be a precise way to
>>> "use" partitions as a "poor mans" block.
>>>
>>> Just a thought - not sure if anyone has tried (1) or (2) before in order
>>> to simulate blocks and increase locality by utilizing the partition API.
>>>
>>> --
>>> Jay Vyas
>>> http://jayunit100.blogspot.com
>>>
>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>
+
Jay Vyas 2013-04-30, 19:59
+
Mohammad Tariq 2013-04-30, 20:09