Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Large Scale Table Reprocess


Copy link to this message
-
Large Scale Table Reprocess
Just finishing up testing with Hive 11 and ORC. Thank you to Owen and all
those who have put hard work into this. Just ORC files, when compared to RC
files in Hive 9, 10, and 11 saw a huge increase in performance, it was
amazing.  That said, now we gotta reprocess.
We have a large table with lots of partitions. I'd love to be able to
reprocess into a new table, like table_orc, and then at the end of it all,
just drop the original table. That said, I see it being hard to do from a
space perspective. and I will have to do partition at a time.  But then
theirs production issues, if I update a partition, insert overwrite int the
ORC table, then I have delete the original and production users will be
missing data.... decisions decisions.

So any ideas? Can a table have some partitions in one file type and other
partitions in another? That sounds scary.  Anywho, a good problem to
have... that performance will be worth it.
+
Alan Gates 2013-07-26, 20:29
+
John Omernik 2013-07-26, 22:09
+
John Omernik 2013-07-26, 22:17
+
Alan Gates 2013-07-27, 01:06
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB