Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Avro and Hive


Copy link to this message
-
Avro and Hive
Hi all,

I'd seen past emails from Scott and Doug about using Avro as the data  
format for Hive.

This was back in April/May, and I'm wondering about current state of  
the world.

Specifically, what's the recommended approach (& known issues) with  
using Avro files with Hive?

E.g. Scott mentioned that "Avro files should be better performing and  
more compact than sequence files." Has that been proven out?

He also discussed a minor issue with maps - "Their maps however can  
have any intrinsic type as a key (int, long, string, float, double)."

And a more serious issue with unions, though this wouldn't directly  
impact us as we wouldn't be using that feature.

In our situation, we're trying to get the best of both worlds by  
leveraging Hive for analytics, and Cascading for workflow, so having  
one store in HDFS for both would be a significant win.

Thanks for any input!

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g
+
Scott Carey 2010-11-01, 16:32
+
Ken Krugler 2010-11-02, 16:34