Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Quering RDBMS table in a Hive query


Copy link to this message
-
Re: Quering RDBMS table in a Hive query
Thanks Jan

On Fri, Jun 15, 2012 at 4:35 PM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
> On 6/15/12, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote:
>> I didn't know InputFormat and LineReader could help, though I didn't
>> look at them closely. I was thinking about implementing a
>> Table-Generating Function (UDTF) if there is no an already implemented
>> solution.
>
> Both is possible, InputFormat and/or UD(T)F. It all depends on what
> you need. I actually use both - in Input format I load lists of
> allowed values to check the data and in UDF I query some other
> database for values necessary only in some queries. Generally, I'd use
>  InputFormat for situations where all jobs over given table would
> require the additional data from RDBMS. Oppositely, in situations
> where only few jobs out of many requires the RDBMS connection, I would
> use UDF.
>
> I think that the difference in performance between the two is rather
> small, if any. Also UDF is easier to write, so it might be the "weapon
> of choice", at least if you don't already use custom InputFormat.
>
> Jan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB