Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Quering RDBMS table in a Hive query


Copy link to this message
-
Re: Quering RDBMS table in a Hive query
Ruslan Al-Fakikh 2012-06-15, 17:28
Thanks Jan

On Fri, Jun 15, 2012 at 4:35 PM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
> On 6/15/12, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote:
>> I didn't know InputFormat and LineReader could help, though I didn't
>> look at them closely. I was thinking about implementing a
>> Table-Generating Function (UDTF) if there is no an already implemented
>> solution.
>
> Both is possible, InputFormat and/or UD(T)F. It all depends on what
> you need. I actually use both - in Input format I load lists of
> allowed values to check the data and in UDF I query some other
> database for values necessary only in some queries. Generally, I'd use
>  InputFormat for situations where all jobs over given table would
> require the additional data from RDBMS. Oppositely, in situations
> where only few jobs out of many requires the RDBMS connection, I would
> use UDF.
>
> I think that the difference in performance between the two is rather
> small, if any. Also UDF is easier to write, so it might be the "weapon
> of choice", at least if you don't already use custom InputFormat.
>
> Jan