Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Quering RDBMS table in a Hive query


Copy link to this message
-
Re: Quering RDBMS table in a Hive query
Jan Dolinár 2012-06-15, 12:35
On 6/15/12, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote:
> I didn't know InputFormat and LineReader could help, though I didn't
> look at them closely. I was thinking about implementing a
> Table-Generating Function (UDTF) if there is no an already implemented
> solution.

Both is possible, InputFormat and/or UD(T)F. It all depends on what
you need. I actually use both - in Input format I load lists of
allowed values to check the data and in UDF I query some other
database for values necessary only in some queries. Generally, I'd use
 InputFormat for situations where all jobs over given table would
require the additional data from RDBMS. Oppositely, in situations
where only few jobs out of many requires the RDBMS connection, I would
use UDF.

I think that the difference in performance between the two is rather
small, if any. Also UDF is easier to write, so it might be the "weapon
of choice", at least if you don't already use custom InputFormat.

Jan