Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - only one mapper


Copy link to this message
-
Re: only one mapper
Rajesh Balamohan 2013-08-22, 03:42
Good to hear that.
On Thu, Aug 22, 2013 at 9:02 AM, 闫昆 <[EMAIL PROTECTED]> wrote:

> thanks all i move lzo index to hive directory is work fine .
> thanks
>
>
> 2013/8/22 Rajesh Balamohan <[EMAIL PROTECTED]>
>
>> Create the LZO index after moving the file to hive directory (i.e after
>> executing your LOAD DATA* statement).  Index file is needed only during job
>> execution and if its not present in the same directory, it would not split
>> the large file.
>>
>>
>> On Thu, Aug 22, 2013 at 7:11 AM, 闫昆 <[EMAIL PROTECTED]> wrote:
>>
>>> In hive i use SET
>>> mapreduce.input.fileinputformat.split.maxsize=134217728; but not effect and
>>> i found when use
>>>
>>> LOAD DATA INPATH  '/data_split/data_rowkey.lzo'
>>>
>>> OVERWRITE INTO TABLE data_zh
>>>
>>> The hdfs data move to hive directory i  CREATE EXTERNAL TABLE but issue
>>> is data_rowkey.lzo.index is also exist hdfs /data_split/ directory
>>> .actually data move to hive directory , index file in hdfs directory ,they
>>> are not in the same directory
>>>
>>>
>>> 2013/8/22 Sanjay Subramanian <[EMAIL PROTECTED]>
>>>
>>>>  Hi
>>>>
>>>>  Try this setting in your hive query
>>>>
>>>>  SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;
>>>>
>>>>  If u set this value "low" then the MR job will use this size to split
>>>> the input LZO files and u will get multiple mappers (and make sure the
>>>> input LZO files are indexed I.e. .LZO.INDEX files are created)
>>>>
>>>>  sanjay
>>>>
>>>>
>>>>   From: Edward Capriolo <[EMAIL PROTECTED]>
>>>> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>>>> Date: Wednesday, August 21, 2013 10:43 AM
>>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>>>> Subject: Re: only one mapper
>>>>
>>>>   LZO files are only splittable if you index them. Sequence files
>>>> compresses with LZO are splittable without being indexed.
>>>>
>>>>  Snappy + SequenceFile is a better option then LZO.
>>>>
>>>>
>>>> On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <[EMAIL PROTECTED]>wrote:
>>>>
>>>>>  LZO files are combinable so check your max split setting.
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%[EMAIL PROTECTED]%3E
>>>>>
>>>>>  igor
>>>>> decide.com
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>>  hi all when i use hive
>>>>>> hive job make only one mapper actually my file split 18 block my
>>>>>> block size is 128MB and data size 2GB
>>>>>> i use lzo compression and create file.lzo and make index
>>>>>> file.lzo.index
>>>>>> i use hive 0.10.0
>>>>>>
>>>>>>  Total MapReduce jobs = 1
>>>>>> Launching Job 1 out of 1
>>>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>>>> Cannot run job locally: Input Size (= 2304560827) is larger than
>>>>>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>>>>>> Starting Job = job_1377071515613_0003, Tracking URL >>>>>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>>>>>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job
>>>>>>  -kill job_1377071515613_0003
>>>>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>>>>> reducers: 0
>>>>>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>>>>>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative
>>>>>> CPU 6.81 sec
>>>>>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative
>>>>>> CPU 6.81 sec
>>>>>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative
>>>>>> CPU 6.81 sec
>>>>>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative
>>>>>> CPU 9.95 sec
>>>>>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative
>>>>>> CPU 9.95 sec
>>>>>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative
>>>>>> CPU 13.0 sec
>>>>>>
>>>>>>  --
>>>>>>
>>>>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>>>>> ecosystem, I hope one day I can contribute their own code
~Rajesh.B