-Re: Passing table properties to the InputFormat
Jakob Homan 2011-08-22, 16:30
This is essentially the hack that needs to be done as Hive has no
InputFormat that passes the properties from the job (in contrast to
HiveOutputFormat, which does. I've been meaning to open a JIRA to make
this symmetrical and obviate the need for the hack.) There are two
things to take into account, howeover: one is that in queries that go
over multiple tables, there may be multiple properties hanging about
and so it is necessary to identify the correct properties for this bit
of work. The second is that in select * statements, no mapred job is
run, so this trick doesn't work (and instead, the Configuration object
can be used, since it's local).
Take a look at this code from Haivvreo (http://bit.ly/pIG3cB), which
has to pull out the reader schema from the properties file for the
correct partition. It's doing essentially the same task as what
you're trying to accomplish.
Hope this helps.
On Mon, Aug 22, 2011 at 9:08 AM, Shantian Purkad
<[EMAIL PROTECTED]> wrote:
> I have been able to get the table properties in InputFormat as below.
> However I am not sure if that is correct way or if there is any better way
> for that.
> Properties tableProperties > Utilities.getMapRedWork(job).getPathToPartitionInfo().get(getInputPaths(job).toString()).getTableDesc().getProperties()
> From: Shantian Purkad <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Saturday, August 20, 2011 5:01 PM
> Subject: Passing table properties to the InputFormat
> I have a custom Input format that reads multiple lines as one row based on
> number of columns in a table.
> I want to dynamically pass the table properties (like number of columns in
> table, their data types etc. just like what you get in SerDe) How can I do
> If that is not possible, and SerDe is an option, how can I use my custom
> record reader in SerDe?
> My table definition is
> create table delimited_data_serde
> col1 int,
> col2 string,
> col3 int,
> col4 string,
> col5 string,
> col6 string
> STORED AS INPUTFORMAT 'fwrk.hadoop.input.DelimitedInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> The input format needs needs the property 'total.fields.count'='6'
> If I set this using set total.fields.count=6 ; It works, however I will have
> to change this property for every table that uses the custom Input format
> before I query that table.
> How can I automatically get handle to the table properties in input format?