-Question about documentation for Hive in general
Lars Francke 2012-06-12, 23:27
in the last couple of days and weeks I've been going through the Wiki
and tried to find things that were undocumented or outdated (and
This is a non-exhaustive list of things I found: Avro support,
TIMESTAMP, BINARY, union types, a lot of UDFs, Indexes, HBase support,
Table links, CLI options, ...
A lot of these things are very nice features that could be very useful
to end users. I've tried to do my best to document what I understand
myself but for some of these things it's too much to understand. For
some features there are either JIRAs or Design documents available but
I've found that the implementation often differs significantly from
what the design says so I had to resort to patches which are hard to
read (at least for me).
Wouldn't a general policy make sense that allows new and changed
features only if they are documented? How else are end users supposed
to find about all these great things? How are you bringing new users
up to speed with Hive and all its features in your companies?
In the mean time I'll continue to monitor commits and document what I
can but I have some specific questions that maybe someone can help
* What is the status of indexes? What does work, when and how can they
be used? The design doc seems out of date but I'm not sure.
* How do union types really work? The JIRA mentions tags that can
be named but the tests in the patch don't seem to use them. Are they
optional or not needed at all?
* Is the design document for BINARY types still accurate?
I'm sure more will pop up and I appreciate any help. Also I'm not a
native english speaker and no Hive expert so please feel free to
correct whatever I'm writing in the Wiki.