Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Variable Substitution


Copy link to this message
-
Re: Variable Substitution
I'm fine with the variable placeholder not being removed in cases where the variable is not defined (until I change my mind). When I define var2 and var3, though, their placeholders aren't swapped for their values.

My reasoning for this was that I'm moving from one execution script that defines var1, to a newer script that defines var2 and var3.  The goal for this was that I could use the same hive script with either execution script until we've fully migrated.

I don't disagree with the placeholders remaining if they're not defined, I'm just unsure that having 1 variable reference that's not defined should prevent the others from working.
On Mar 6, 2013, at 4:41 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote:

> It was done like this in hive because that is what hadoops variable substitution does, namely if it does not understand the variable it does not replace it.
>
> On Wed, Mar 6, 2013 at 4:30 PM, Dean Wampler <[EMAIL PROTECTED]> wrote:
>> Even newer versions of Hive do this. Any reason you don't want to provide a definition for all of them? You could argue that an undefined variable is a bug and leaving the literal text in place makes it easier to notice. Although, Unix shells would insert an empty string, so never mind ;)
>>
>> On Wed, Mar 6, 2013 at 3:13 PM, Matt Tucker <[EMAIL PROTECTED]> wrote:
>>> Using CDH3u3 (Hive 0.7.1), it appears that variable substitution becomes disabled when I use a variable in a query that hasn't been defined.
>>>
>>> For instance, using the following script:
>>>
>>> set var2=2013-02-01;
>>> set var3=2013-02-10;
>>>
>>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR (clndr_dt >= "${hiveconf:var2}" AND clndr_dt <= "${hiveconf:var3}") LIMIT 1;
>>>
>>> I would expect the results job configuration to list the hive.query.string as:
>>>
>>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR (clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10") LIMIT 1;
>>>
>>> (or with ${hiveconf:var1} removed, leaving an empty string).
>>>
>>> Instead, it prints the query as-is (with the variable placeholders).  If I set var1=2012-01-01, it properly substitutes the variables, but it only returns the '2012-01-01' record (not the 2013 records).
>>>
>>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "2012-01-01" OR ( clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10" ) LIMIT 1
>>>
>>> I was originally planning to use this for partition pruning, but it doesn't appear to be the cause as the calendar table is not partitioned.
>>>
>>> Is there something that I've overlooked?
>>>
>>> Thanks!
>>
>>
>>
>> --
>> Dean Wampler, Ph.D.
>> thinkbiganalytics.com
>> +1-312-339-1330
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB