Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Issue: Max block location exceeded for split error when running hive


Copy link to this message
-
Re: Issue: Max block location exceeded for split error when running hive
Are you using a CombineFileInputFormat or similar input format then, perhaps?

On Thu, Sep 19, 2013 at 1:29 PM, Murtaza Doctor <[EMAIL PROTECTED]> wrote:
> We are using the default replication factor of 3.  When new files are put on
> HDFS we never override the replication factor. When there is more data
> involved it fails on a larger split size.
>
>
> On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <[EMAIL PROTECTED]>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> > depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> > version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>
>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB