Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: split into less files


Copy link to this message
-
Re: split into less files
It sounds like you want to look at setting hive.merge.mapredfiles to true in your hive-site.xml.

Just be aware that it will likely add another map step to your jobs to consolidate the files.

Matt Tucker

On Nov 8, 2011, at 6:19 PM, Shouguo Li <[EMAIL PROTECTED]> wrote:

> i think that has to do with your configured block size, check what's your value for dfs.block.size in /hdfs-site.xml    
> but just curious, why would number of files matter for your use case?
>
>
> On Fri, Oct 21, 2011 at 1:18 AM, Vikas Srivastava <[EMAIL PROTECTED]> wrote:
> Hey All,
>
>
> i have an issue like i got a table having single partition but in that partition say around 100 200mb files  when i overwrite this into other table its make 100 files of 20 mb(compressed) what i want is that it should make only 1 or 2 or 10 file of 200mb or 100mb
>
>
> means after overwrite its should make less no of file as compare to non compressed.
>
>
>
>
> --
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB