Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Large Scale Table Reprocess


Copy link to this message
-
Re: Large Scale Table Reprocess
Can you give some examples of how to alter partitions for different input
types? I'd appreciate it :)
On Fri, Jul 26, 2013 at 3:29 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> A table can definitely have partitions with different input
> formats/serdes.  We test this all the time.
>
> Assuming your old data doesn't stay for ever and most of your queries are
> on more recent data (which is usually the case) I'd advise you to not
> reprocess any data, just alter the table to store new partitions in ORC.
>  Then with time you'll slowly transition the table to ORC.  This avoids all
> the issues you noted.  And since most queries probably only access recent
> data you'll see speed ups soon after the switch.
>
> Alan.
>
> On Jul 25, 2013, at 4:45 PM, John Omernik wrote:
>
> > Just finishing up testing with Hive 11 and ORC. Thank you to Owen and
> all those who have put hard work into this. Just ORC files, when compared
> to RC files in Hive 9, 10, and 11 saw a huge increase in performance, it
> was amazing.  That said, now we gotta reprocess.
> >
> >
> > We have a large table with lots of partitions. I'd love to be able to
> reprocess into a new table, like table_orc, and then at the end of it all,
> just drop the original table. That said, I see it being hard to do from a
> space perspective. and I will have to do partition at a time.  But then
> theirs production issues, if I update a partition, insert overwrite int the
> ORC table, then I have delete the original and production users will be
> missing data.... decisions decisions.
> >
> > So any ideas? Can a table have some partitions in one file type and
> other partitions in another? That sounds scary.  Anywho, a good problem to
> have... that performance will be worth it.
> >
> >
>
>