Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Question about documentation for Hive in general


Copy link to this message
-
Re: Question about documentation for Hive in general
Edward Capriolo 2012-06-13, 02:10
Lars,
Great work. What is it with people named Lars that are into writing
documentation?

As you may have heard myself and some others (Dean, Jason) are working
on the programming hive book.

I recently stumbed onto the really amazing thrift raw bytes serde. Who
knew that hive table can be defined only with the named of a Thrift
class and magically the column meta data just auto-populates?

UDFS for the most part carry their own documentation. I do not think
there is anything wrong about covering them in the wiki, but there is
a subtle issue with things going stale (for the most part UDF's are
static). It is kinda a funny gag we have "look at the .q files" but
actually the Q files are really great docs. They show exactly how
things work and they can't go stale.

We could do something crazy like allow comments in q files to get auto
generate into docs or something.

It definitely is "unique" that we ( hive as a project) spend a great
deal of time building things into hive and significantly less time
documenting it.

1) They work and can accelerate some types of queries but there is
more work to be done
2) I dont know,
3) No the docs on binary are not fully accurate
On Tue, Jun 12, 2012 at 7:27 PM, Lars Francke <[EMAIL PROTECTED]> wrote:
> Hi,
>
> in the last couple of days and weeks I've been going through the Wiki
> and tried to find things that were undocumented or outdated (and
> update them).
>
> This is a non-exhaustive list of things I found: Avro support,
> TIMESTAMP, BINARY, union types, a lot of UDFs, Indexes, HBase support,
> Table links, CLI options, ...
>
> A lot of these things are very nice features that could be very useful
> to end users. I've tried to do my best to document what I understand
> myself but for some of these things it's too much to understand. For
> some features there are either JIRAs or Design documents available but
> I've found that the implementation often differs significantly from
> what the design says so I had to resort to patches which are hard to
> read (at least for me).
>
> Wouldn't a general policy make sense that allows new and changed
> features only if they are documented? How else are end users supposed
> to find about all these great things? How are you bringing new users
> up to speed with Hive and all its features in your companies?
>
> In the mean time I'll continue to monitor commits and document what I
> can but I have some specific questions that maybe someone can help
> with:
>
> * What is the status of indexes? What does work, when and how can they
> be used? The design doc[1] seems out of date but I'm not sure.
> * How do union types really work? The JIRA[2] mentions tags that can
> be named but the tests in the patch don't seem to use them. Are they
> optional or not needed at all?
> * Is the design document for BINARY[3] types still accurate?
>
> I'm sure more will pop up and I appreciate any help. Also I'm not a
> native english speaker and no Hive expert so please feel free to
> correct whatever I'm writing in the Wiki.
>
> Cheers,
> Lars
>
> [1] <https://cwiki.apache.org/confluence/display/Hive/IndexDev>
> [2] <https://issues.apache.org/jira/browse/HIVE-537>
> [3] <https://cwiki.apache.org/Hive/binary-datatype-proposal.html>