John Omernik 2013-07-25, 23:45
Alan Gates 2013-07-26, 20:29
-Re: Large Scale Table Reprocess
John Omernik 2013-07-26, 22:09
Can you give some examples of how to alter partitions for different input
types? I'd appreciate it :)
On Fri, Jul 26, 2013 at 3:29 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> A table can definitely have partitions with different input
> formats/serdes. We test this all the time.
> Assuming your old data doesn't stay for ever and most of your queries are
> on more recent data (which is usually the case) I'd advise you to not
> reprocess any data, just alter the table to store new partitions in ORC.
> Then with time you'll slowly transition the table to ORC. This avoids all
> the issues you noted. And since most queries probably only access recent
> data you'll see speed ups soon after the switch.
> On Jul 25, 2013, at 4:45 PM, John Omernik wrote:
> > Just finishing up testing with Hive 11 and ORC. Thank you to Owen and
> all those who have put hard work into this. Just ORC files, when compared
> to RC files in Hive 9, 10, and 11 saw a huge increase in performance, it
> was amazing. That said, now we gotta reprocess.
> > We have a large table with lots of partitions. I'd love to be able to
> reprocess into a new table, like table_orc, and then at the end of it all,
> just drop the original table. That said, I see it being hard to do from a
> space perspective. and I will have to do partition at a time. But then
> theirs production issues, if I update a partition, insert overwrite int the
> ORC table, then I have delete the original and production users will be
> missing data.... decisions decisions.
> > So any ideas? Can a table have some partitions in one file type and
> other partitions in another? That sounds scary. Anywho, a good problem to
> have... that performance will be worth it.
John Omernik 2013-07-26, 22:17
Alan Gates 2013-07-27, 01:06