Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - single output file per partition?


+
Igor Tatarinov 2013-08-20, 21:29
Copy link to this message
-
Re: single output file per partition?
Stephen Sprague 2013-08-21, 15:51
hi igor,
lots of ideas there!  I can't speak for them all but let me confirm first
that "cluster by X into 1 bucket" didn't work?  I would have thought that
would have done it.
On Tue, Aug 20, 2013 at 2:29 PM, Igor Tatarinov <[EMAIL PROTECTED]> wrote:

> What's the best way to enforce a single output file per partition?
>
> INSERT OVERWRITE TABLE <table>
> PARTITION (x,y,z)
> SELECT ...
> FROM ...
> WHERE ...
>
> It tried adding CLUSTER BY x,y,z at the end thinking that sorting will
> force a single reducer per partition but that didn't work. I still got
> multiple files per partition.
>
> Do I have to use a single reduce task? With a few TB of data that's
> probably not a good idea.
>
> My current idea is to create a temp table with the same partitioning
> structure. Insert into that table first and then select * from that table
> into the output table. With combineinputformat=true that should work right?
>
> Or should I make Hive merge output files instead? (using hive.merge.mapfiles)
> Will that work with a partitioned table?
>
> Thanks!
> igor
>
+
Igor Tatarinov 2013-08-21, 18:12
+
Stephen Sprague 2013-08-21, 19:07
+
Sanjay Subramanian 2013-08-21, 19:15
+
Igor Tatarinov 2013-08-21, 20:19