Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: partition as block?


Copy link to this message
-
Re: partition as block?
Hmmm. I was actually thinking about the very first step. How are you going
to create the maps. Suppose you are on a block-less filesystem and you have
a custom Format that is going to give you the splits dynamically. This mean
that you are going to store the file as a whole and create the splits as
you continue to read the file. Wouldn't it be a bottleneck from 'disk'
point of view??Are you not going away from the distributed paradigm??

Am I taking it in the correct way. Please correct me if I am getting it
wrong.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 12:34 AM, Jay Vyas <[EMAIL PROTECTED]> wrote:

> Well, to be more clear, I'm wondering how hadoop-mapreduce can be
> optimized in a block-less filesystem... And am thinking about application
> tier ways to simulate blocks - i.e. by making the granularity of partitions
> smaller.
>
> Wondering, if there is a way to hack an increased numbers of partitions as
> a mechanism to simulate blocks - or wether this is just a bad idea
> altogether :)
>
>
>
>
> On Tue, Apr 30, 2013 at 2:56 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>
>> Hello Jay,
>>
>>     What are you going to do in your custom InputFormat and
>> partitioner?Is your InputFormat is going to create larger splits which will
>> overlap with larger blocks?If that is the case, IMHO, then you are going to
>> reduce the no. of mappers thus reducing the parallelism. Also, much larger
>> block size will put extra overhead when it comes to disk I/O.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Wed, May 1, 2013 at 12:16 AM, Jay Vyas <[EMAIL PROTECTED]> wrote:
>>
>>> Hi guys:
>>>
>>> Im wondering - if I'm running mapreduce jobs on a cluster with large
>>> block sizes - can i increase performance with either:
>>>
>>> 1) A custom FileInputFormat
>>>
>>>  2) A custom partitioner
>>>
>>> 3) -DnumReducers
>>>
>>> Clearly, (3) will be an issue due to the fact that it might overload
>>> tasks and network traffic... but maybe (1) or (2) will be a precise way to
>>> "use" partitions as a "poor mans" block.
>>>
>>> Just a thought - not sure if anyone has tried (1) or (2) before in order
>>> to simulate blocks and increase locality by utilizing the partition API.
>>>
>>> --
>>> Jay Vyas
>>> http://jayunit100.blogspot.com
>>>
>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB