Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> only one mapper


Copy link to this message
-
Re: only one mapper
Good to hear that.
On Thu, Aug 22, 2013 at 9:02 AM, 闫昆 <[EMAIL PROTECTED]> wrote:

> thanks all i move lzo index to hive directory is work fine .
> thanks
>
>
> 2013/8/22 Rajesh Balamohan <[EMAIL PROTECTED]>
>
>> Create the LZO index after moving the file to hive directory (i.e after
>> executing your LOAD DATA* statement).  Index file is needed only during job
>> execution and if its not present in the same directory, it would not split
>> the large file.
>>
>>
>> On Thu, Aug 22, 2013 at 7:11 AM, 闫昆 <[EMAIL PROTECTED]> wrote:
>>
>>> In hive i use SET
>>> mapreduce.input.fileinputformat.split.maxsize=134217728; but not effect and
>>> i found when use
>>>
>>> LOAD DATA INPATH  '/data_split/data_rowkey.lzo'
>>>
>>> OVERWRITE INTO TABLE data_zh
>>>
>>> The hdfs data move to hive directory i  CREATE EXTERNAL TABLE but issue
>>> is data_rowkey.lzo.index is also exist hdfs /data_split/ directory
>>> .actually data move to hive directory , index file in hdfs directory ,they
>>> are not in the same directory
>>>
>>>
>>> 2013/8/22 Sanjay Subramanian <[EMAIL PROTECTED]>
>>>
>>>>  Hi
>>>>
>>>>  Try this setting in your hive query
>>>>
>>>>  SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;
>>>>
>>>>  If u set this value "low" then the MR job will use this size to split
>>>> the input LZO files and u will get multiple mappers (and make sure the
>>>> input LZO files are indexed I.e. .LZO.INDEX files are created)
>>>>
>>>>  sanjay
>>>>
>>>>
>>>>   From: Edward Capriolo <[EMAIL PROTECTED]>
>>>> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>>>> Date: Wednesday, August 21, 2013 10:43 AM
>>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>>>> Subject: Re: only one mapper
>>>>
>>>>   LZO files are only splittable if you index them. Sequence files
>>>> compresses with LZO are splittable without being indexed.
>>>>
>>>>  Snappy + SequenceFile is a better option then LZO.
>>>>
>>>>
>>>> On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <[EMAIL PROTECTED]>wrote:
>>>>
>>>>>  LZO files are combinable so check your max split setting.
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%[EMAIL PROTECTED]%3E
>>>>>
>>>>>  igor
>>>>> decide.com
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>>  hi all when i use hive
>>>>>> hive job make only one mapper actually my file split 18 block my
>>>>>> block size is 128MB and data size 2GB
>>>>>> i use lzo compression and create file.lzo and make index
>>>>>> file.lzo.index
>>>>>> i use hive 0.10.0
>>>>>>
>>>>>>  Total MapReduce jobs = 1
>>>>>> Launching Job 1 out of 1
>>>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>>>> Cannot run job locally: Input Size (= 2304560827) is larger than
>>>>>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>>>>>> Starting Job = job_1377071515613_0003, Tracking URL >>>>>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>>>>>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job
>>>>>>  -kill job_1377071515613_0003
>>>>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>>>>> reducers: 0
>>>>>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>>>>>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative
>>>>>> CPU 6.81 sec
>>>>>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative
>>>>>> CPU 6.81 sec
>>>>>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative
>>>>>> CPU 6.81 sec
>>>>>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative
>>>>>> CPU 9.95 sec
>>>>>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative
>>>>>> CPU 9.95 sec
>>>>>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative
>>>>>> CPU 13.0 sec
>>>>>>
>>>>>>  --
>>>>>>
>>>>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>>>>> ecosystem, I hope one day I can contribute their own code
~Rajesh.B
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB