Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive Query having virtual column INPUT__FILE__NAME in where clause gives exception


Copy link to this message
-
Re: Hive Query having virtual column INPUT__FILE__NAME in where clause gives exception
Jitendra,
I am really not sure you can use virtual columns in where clause.  (I never
tried it so I may be wrong as well).

can you try executing your query as below

select count(*), filename from (select INPUT__FILE__NAME as filename from
netflow)tmp  where filename='vzb.1351794600.0';

please check for query syntax. I am giving an idea and have not verified
the query
On Fri, Jun 14, 2013 at 4:57 PM, Jitendra Kumar Singh <
[EMAIL PROTECTED]> wrote:

> Hi Guys,
>
> Executing hive query with filter on virtual column INPUT_*FILE*_NAME
> result in following exception.
>
> hive> select count(*) from netflow where INPUT__FILE__NAME='vzb.
> 1351794600.0';
>
> FAILED: SemanticException java.lang.RuntimeException: cannot find field
> input__file__name from
> [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@1d264bf5,
> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@3d44d0c6
> ,
>
> .
>
> .
>
> .
>
>
> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@7e6bc5aa
> ]
>
> This error is different from the one we get when column name is wrong
>
> hive> select count(*) from netflow where INPUT__FILE__NAM='vzb.
> 1351794600.0';
>
> FAILED: SemanticException [Error 10004]: Line 1:35 Invalid table alias or
> column reference 'INPUT__FILE__NAM': (possible column names are: first,
> last, ....)
>
> But using this virtual column in select clause works fine.
>
> hive> select INPUT__FILE__NAME from netflow group by INPUT__FILE__NAME;
>
> Total MapReduce jobs = 1
>
> Launching Job 1 out of 1
>
> Number of reduce tasks not specified. Estimated from input data size: 4
>
> In order to change the average load for a reducer (in bytes):
>
>   set hive.exec.reducers.bytes.per.reducer=<number>
>
> In order to limit the maximum number of reducers:
>
>   set hive.exec.reducers.max=<number>
>
> In order to set a constant number of reducers:
>
>   set mapred.reduce.tasks=<number>
>
> Starting Job = job_201306041359_0006, Tracking URL > http://192.168.0.224:50030/jobdetails.jsp?jobid=job_201306041359_0006
>
> Kill Command = /opt/hadoop/bin/../bin/hadoop job  -kill
> job_201306041359_0006
>
> Hadoop job information for Stage-1: number of mappers: 12; number of
> reducers: 4
>
> 2013-06-14 18:20:10,265 Stage-1 map = 0%,  reduce = 0%
>
> 2013-06-14 18:20:33,363 Stage-1 map = 8%,  reduce = 0%
>
> .
>
> .
>
> .
>
> 2013-06-14 18:21:15,554 Stage-1 map = 100%,  reduce = 100%
>
> Ended Job = job_201306041359_0006
>
> MapReduce Jobs Launched:
>
> Job 0: Map: 12  Reduce: 4   HDFS Read: 3107826046 HDFS Write: 55 SUCCESS
>
> Total MapReduce CPU Time Spent: 0 msec
>
> OK
>
> hdfs://192.168.0.224:9000/data/jk/vzb/vzb.1351794600.0
>
> Time taken: 78.467 seconds
>
> I am trying to create external hive table on already present HDFS data.
> And I have extra files in the folder that I want to ignore. Similar to what
> is asked and suggested in following stackflow questions how to make hive
> take only specific files as input from hdfs folder<http://stackoverflow.com/questions/16844758/how-to-make-hive-take-only-specific-files-as-input-from-hdfs-folder> when
> creating an external table in hive can I point the location to specific
> files in a direcotry?<http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil>
>
> Any help would be appreciated. Full stack trace I am getting is as follows
>
> 2013-06-14 15:01:32,608 ERROR ql.Driver
> (SessionState.java:printError(401)) - FAILED: SemanticException
> java.lang.RuntimeException: cannot find field input__
>
> org.apache.hadoop.hive.ql.parse.SemanticException:
> java.lang.RuntimeException: cannot find field input__file__name from
> [org.apache.hadoop.hive.serde2.object
>
>         at
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:122)
>
>         at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)

Nitin Pawar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB