Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> only one mapper


Copy link to this message
-
Re: only one mapper
Create the LZO index after moving the file to hive directory (i.e after
executing your LOAD DATA* statement).  Index file is needed only during job
execution and if its not present in the same directory, it would not split
the large file.
On Thu, Aug 22, 2013 at 7:11 AM, 闫昆 <[EMAIL PROTECTED]> wrote:

> In hive i use SET mapreduce.input.fileinputformat.split.maxsize=134217728;
> but not effect and i found when use
>
> LOAD DATA INPATH  '/data_split/data_rowkey.lzo'
>
> OVERWRITE INTO TABLE data_zh
>
> The hdfs data move to hive directory i  CREATE EXTERNAL TABLE but issue
> is data_rowkey.lzo.index is also exist hdfs /data_split/ directory
> .actually data move to hive directory , index file in hdfs directory ,they
> are not in the same directory
>
>
> 2013/8/22 Sanjay Subramanian <[EMAIL PROTECTED]>
>
>>  Hi
>>
>>  Try this setting in your hive query
>>
>>  SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;
>>
>>  If u set this value "low" then the MR job will use this size to split
>> the input LZO files and u will get multiple mappers (and make sure the
>> input LZO files are indexed I.e. .LZO.INDEX files are created)
>>
>>  sanjay
>>
>>
>>   From: Edward Capriolo <[EMAIL PROTECTED]>
>> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Date: Wednesday, August 21, 2013 10:43 AM
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Subject: Re: only one mapper
>>
>>   LZO files are only splittable if you index them. Sequence files
>> compresses with LZO are splittable without being indexed.
>>
>>  Snappy + SequenceFile is a better option then LZO.
>>
>>
>> On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <[EMAIL PROTECTED]> wrote:
>>
>>>  LZO files are combinable so check your max split setting.
>>>
>>> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%[EMAIL PROTECTED]%3E
>>>
>>>  igor
>>> decide.com
>>>
>>>
>>>
>>> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <[EMAIL PROTECTED]> wrote:
>>>
>>>>  hi all when i use hive
>>>> hive job make only one mapper actually my file split 18 block my block
>>>> size is 128MB and data size 2GB
>>>> i use lzo compression and create file.lzo and make index file.lzo.index
>>>> i use hive 0.10.0
>>>>
>>>>  Total MapReduce jobs = 1
>>>> Launching Job 1 out of 1
>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>> Cannot run job locally: Input Size (= 2304560827) is larger than
>>>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>>>> Starting Job = job_1377071515613_0003, Tracking URL >>>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>>>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
>>>> job_1377071515613_0003
>>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>>> reducers: 0
>>>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>>>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>> 6.81 sec
>>>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>> 6.81 sec
>>>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>> 6.81 sec
>>>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>>> 9.95 sec
>>>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>>> 9.95 sec
>>>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU
>>>> 13.0 sec
>>>>
>>>>  --
>>>>
>>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>>> ecosystem, I hope one day I can contribute their own code
>>>>
>>>> YanBit
>>>> [EMAIL PROTECTED]
>>>>
>>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> =====================>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
~Rajesh.B
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB