I'd seen past emails from Scott and Doug about using Avro as the data
format for Hive.
This was back in April/May, and I'm wondering about current state of
Specifically, what's the recommended approach (& known issues) with
using Avro files with Hive?
E.g. Scott mentioned that "Avro files should be better performing and
more compact than sequence files." Has that been proven out?
He also discussed a minor issue with maps - "Their maps however can
have any intrinsic type as a key (int, long, string, float, double)."
And a more serious issue with unions, though this wouldn't directly
impact us as we wouldn't be using that feature.
In our situation, we're trying to get the best of both worlds by
leveraging Hive for analytics, and Cascading for workflow, so having
one store in HDFS for both would be a significant win.
Thanks for any input!
e l a s t i c w e b m i n i n g
Scott Carey 2010-11-01, 16:32
Ken Krugler 2010-11-02, 16:34