Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> adding filenames as new columns via Hive


Copy link to this message
-
RE: adding filenames as new columns via Hive
You could also do this as a simple udf instead of a virtual column. Virtual columns do get shown in the describe command and I don't think it would make sense to show this in the describe command. So instead of
Select FILENAME, xyz from T

We could just do

Select Filename(), xyz from T

Thoughts?

Ashish

-----Original Message-----
From: Edward Capriolo [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 16, 2009 12:05 PM
To: [EMAIL PROTECTED]
Subject: Re: adding filenames as new columns via Hive

I just put in a related thread about this. This would be really nice.
It is just a virtual column, we dont need it in the metadata if we also have a command like 'show files in partition' so we can inspect what is there as well.
On Wed, Sep 16, 2009 at 3:02 PM, Namit Jain <[EMAIL PROTECTED]> wrote:
> I don't think it is a good idea to make it a part of table metadata in
> any way.
>
> What happens if the filename changes ? It will be very difficult to
> maintain.
>
> But, we can definitely add some virtual columns (FILENAME can be one
> of them
>
> to start with - it should not show up in describe, select * etc.
>
>
>
> But, the user can query based on them - this is mostly for advanced
> users and
>
> can be used for pruning etc. also
>
>
>
>
>
> I will open a new jira, and we can continue the discussion there.
>
>
>
>
>
> -namit
>
>
>
>
>
>
>
>
>
> From: Avram Aelony [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 16, 2009 11:39 AM
> To: [EMAIL PROTECTED]
> Subject: RE: adding filenames as new columns via Hive
>
>
>
>
>
> Very cool.  Looking forward to seeing this feature in action. J
>
>
>
> Thanks,
>
> -A
>
>
>
>
>
> From: Prasad Chakka [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 16, 2009 11:33 AM
> To: [EMAIL PROTECTED]
> Subject: Re: adding filenames as new columns via Hive
>
>
>
> FYI, all partition columns can be used as any regular columns select
> queries. So it should be fine.
>
> ________________________________
>
> From: Avram Aelony <[EMAIL PROTECTED]>
> Reply-To: <[EMAIL PROTECTED]>
> Date: Wed, 16 Sep 2009 11:23:45 -0700
> To: <[EMAIL PROTECTED]>
> Subject: RE: adding filenames as new columns via Hive
>
> Sounds great, Prasad.
>
> As long as I can further parse the filename field to piece out (new)
> derived fields, I will be happy. J For example, in a later query I'd
> like to be able to do something like:
>
> select
> substr(filename, 4, 7) as  class_A,
> substr(filename,  8, 10) as class_B
> count( x ) as cnt
> from FOO
> group by
> substr(filename, 4, 7),
> substr(filename,  8, 10) ;
>
>
> thanks,
> -A
>
>
>
> From: Prasad Chakka [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 16, 2009 11:10 AM
> To: [EMAIL PROTECTED]
> Subject: Re: adding filenames as new columns via Hive
>
> I think this can be a good feature though I would like the filename to
> be a partition column (one of such) instead of a separate type of
> column. Would that work?
>
> create external table FOO (  <list of fields and types> ) row format
> delimited fields terminated by ','
> partitioned by (file_name FILENAME)
> stored as textfile location 's3:/somebucket/';
>
> Or table partitioned by datestamp and filename
>
> create external table FOO (  <list of fields and types> ) row format
> delimited fields terminated by ','
> Partitioned by (ds STRING, file_name FILENAME) stored as textfile
> location 's3:/somebucket/';
>
>
> So FILENAME becomes a new type. I like this because partition columns
> are virtual columns just like the filename column and do not exist
> along with data on the disk.
>
> Prasad
>
> ________________________________
>
> From: Avram Aelony <[EMAIL PROTECTED]>
> Reply-To: <[EMAIL PROTECTED]>
> Date: Wed, 16 Sep 2009 10:48:33 -0700
> To: <[EMAIL PROTECTED]>
> Subject: adding filenames as new columns via Hive
>
> Dear Hive list,
>
> I am processing a large volume of files (many files, roughly 500M
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB