Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> hive.merge properties with RCFile


Copy link to this message
-
Re: hive.merge properties with RCFile
This will not work.

set hive.merge.size.per.task=28*1024*1024;

It has to be a number.

On Mon, Jun 18, 2012 at 2:46 PM, Benyi Wang <[EMAIL PROTECTED]> wrote:
> I try to use Hive merge options to merge the smallfiles into a large files
> using the following query. It is working well except that I cannot control
> the output file size. I cannot explain why the output files are always
> 256MB using the following hive.merge.size.per.task and
> hive.merge.smallfiles.avgsize
> settings. Tried 56MB for hive.merge.size.per.task, the size is still 256MB.
>
> "omniture_hit" is an uncompressed CSV file format hive table. I want to
> convert it into RCFile format. The problem is that there will a lot of
> small RCFiles created which are much smaller than our default block size
> 128M if I just simple select * and insert into the new table.
>
> Another problem is that I want to change hive.io.rcfile.record.size to 8MB
> to see if there is more compression ratio for my data. But the result seems
> similar compared with 4MB. The data pattern could be like that as RCFile
> paper said. But how can I verify if my setting to 8MB works?
>
> Thanks.
>
> Ben
>
> SET hive.exec.compress.output=true;
> SET hive.exec.compress.intermediate=true;
>
> set hive.merge.size.per.task=28*1024*1024;
> set hive.merge.smallfiles.avgsize=100000000;
> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>
> SET hive.exec.dynamic.partition=true;
> SET hive.exec.dynamic.partition.mode=nonstrict;
> SET hive.exec.max.dynamic.partitions.pernode=10000;
> SET hive.exec.max.dynamic.partitions=10000;
> SET hive.exec.max.created.files=150000;
>
> create table omniture_hit_rc like omniture_hit;
>
> insert overwrite table omniture_hit_rc partition (local_dt) select *
> from omniture_hit where local_dt>='2012-06-01';
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB