-Re: map tasks are taking ever when running job on 24 TB
Viral Bajaria 2013-04-25, 22:34
How about running it via sub-queries where each query runs over a subset of
the data and has a better chance of finishing. I fear that the amount of
data to shuffle might be too big and you might be running out of
scratch/temp space. Did you verify that the job does not fail due to out of
disk space before the shuffle/reduce can kick in ?
On Thu, Apr 25, 2013 at 3:10 PM, Sanjay Subramanian <
[EMAIL PROTECTED]> wrote:
> That’s a lot of partitions for one Hive Job ! Not sure if that itself is
> the root of the issues….There have been quite a few discussions on max
> 1000-ish number of partitions as good…
> Is your use case conducive too using Combiners (though they cannot be
> guaranteed to be called)
> From: Srinivas Surasani <[EMAIL PROTECTED]>
> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Thursday, April 25, 2013 2:33 PM
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: map tasks are taking ever when running job on 24 TB
> I'm running hive job on 24TB dataset (on 34560 partitions ). here about
> 500 to 1000 mappers are getting succeded (total of 80000) and rest mappaers
> are taking for ever ( their status stays at 0% all times ). Is there any
> limitations on number of partitions/dataset ? are there any paraemeters to
> set here?
> Same job is suceeding on 18TB (25920 partitions ).
> I already set below in my hive query.
> set mapreduce.jobtracker.split.metainfo.maxsize=-1;
> CONFIDENTIALITY NOTICE
> =====================> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.