Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - only one mapper


Copy link to this message
-
Re: only one mapper
Sanjay Subramanian 2013-08-21, 19:13
Hi

Try this setting in your hive query

SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;

If u set this value "low" then the MR job will use this size to split the input LZO files and u will get multiple mappers (and make sure the input LZO files are indexed I.e. .LZO.INDEX files are created)

sanjay
From: Edward Capriolo <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Wednesday, August 21, 2013 10:43 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: only one mapper

LZO files are only splittable if you index them. Sequence files compresses with LZO are splittable without being indexed.

Snappy + SequenceFile is a better option then LZO.
On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
LZO files are combinable so check your max split setting.
http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%[EMAIL PROTECTED]%3E

igor
decide.com<http://decide.com>

On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
hi all when i use hive
hive job make only one mapper actually my file split 18 block my block size is 128MB and data size 2GB
i use lzo compression and create file.lzo and make index file.lzo.index
i use hive 0.10.0

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Cannot run job locally: Input Size (= 2304560827) is larger than hive.exec.mode.local.auto.inputbytes.max (= 134217728)
Starting Job = job_1377071515613_0003, Tracking URL = http://hydra0001:8088/proxy/application_1377071515613_0003/
Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill job_1377071515613_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81 sec
2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81 sec
2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81 sec
2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 9.95 sec
2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 9.95 sec
2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 13.0 sec

--

In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code

YanBit
[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
CONFIDENTIALITY NOTICE
=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.