|
|
-
Declaring schema for unknown number of columns
Chan, Tim 2013-01-07, 22:19
Is it possible to declare a schema when doing a LOAD for data in which you do not know the total number of columns?
For instance. I know the data contains 6 or more columns. These columns are of the same data type.
I basically want to join this data with another data set, but I was getting the following error:
ERROR 1109: Input (six_month_and_variable_month_sales) on which outer join is desired should have a valid schema
+
Chan, Tim 2013-01-07, 22:19
-
Re: Declaring schema for unknown number of columns
Jinyuan Zhou 2013-01-07, 22:27
If you can load it but join operation need the complete schema, then you can try do a generate statement to project your original relation to produce the one you can define schema for all fields.
On Mon, Jan 7, 2013 at 2:19 PM, Chan, Tim <[EMAIL PROTECTED]> wrote:
> Is it possible to declare a schema when doing a LOAD for data in which you > do not know the total number of columns? > > For instance. I know the data contains 6 or more columns. These columns are > of the same data type. > > I basically want to join this data with another data set, but I was getting > the following error: > > ERROR 1109: Input (six_month_and_variable_month_sales) on which outer > join is desired should have a valid schema >
-- -- Jinyuan (Jack) Zhou
+
Jinyuan Zhou 2013-01-07, 22:27
-
Re: Declaring schema for unknown number of columns
Chan, Tim 2013-01-08, 01:48
Hi Jinyuan,
Since I don't know how many columns I will have, I do something like this.
six_month_and_variable_month_sales_2 = FOREACH six_month_and_variable_month_sales GENERATE $0 AS ed_style_id, $1 AS sale_start_month, $2 AS sale_month_1, $3 AS sale_month_2, $4 AS sale_month_3, $5 AS sale_month_4, $6 AS sale_month_5, $7 AS sale_month_6, $8 ..;
I still get the same error when I try to join on this relation. On Mon, Jan 7, 2013 at 2:27 PM, Jinyuan Zhou <[EMAIL PROTECTED]> wrote:
> If you can load it but join operation need the complete schema, then you > can try do a generate statement to project your original relation to > produce the one you can define schema for all fields. > > On Mon, Jan 7, 2013 at 2:19 PM, Chan, Tim <[EMAIL PROTECTED]> wrote: > > > Is it possible to declare a schema when doing a LOAD for data in which > you > > do not know the total number of columns? > > > > For instance. I know the data contains 6 or more columns. These columns > are > > of the same data type. > > > > I basically want to join this data with another data set, but I was > getting > > the following error: > > > > ERROR 1109: Input (six_month_and_variable_month_sales) on which outer > > join is desired should have a valid schema > > > > > > -- > -- Jinyuan (Jack) Zhou >
+
Chan, Tim 2013-01-08, 01:48
-
Re: Declaring schema for unknown number of columns
Jinyuan Zhou 2013-01-08, 02:48
Sorry, Looks like my suggestion won't help unless you were able to specify the schema with the original load statement. If the number of field is ONLY available at runtime but each row have the same number field and you know the position of join key, then I have a ugly approach. First, sample the first line to get the number of fields. Write a UDF that takes all fields of the data. Pass the number to UDF and override the method public Schema outputSchema(Schema input) to output a complete schema. your exec method would return the tuple with same length as input tuple and convert each item in tuple to the datatype you know. The resulting relation should have valid schema. But I don't know how to pass the number to UDF efficiently. I hope some one can have better suggestions. Thanks, On Mon, Jan 7, 2013 at 5:48 PM, Chan, Tim <[EMAIL PROTECTED]> wrote:
> Hi Jinyuan, > > Since I don't know how many columns I will have, I do something like this. > > six_month_and_variable_month_sales_2 = FOREACH > six_month_and_variable_month_sales > GENERATE $0 AS ed_style_id, > $1 AS sale_start_month, > $2 AS sale_month_1, > $3 AS sale_month_2, > $4 AS sale_month_3, > $5 AS sale_month_4, > $6 AS sale_month_5, > $7 AS sale_month_6, > $8 ..; > > I still get the same error when I try to join on this relation. > > > > > On Mon, Jan 7, 2013 at 2:27 PM, Jinyuan Zhou <[EMAIL PROTECTED]> > wrote: > > > If you can load it but join operation need the complete schema, then you > > can try do a generate statement to project your original relation to > > produce the one you can define schema for all fields. > > > > On Mon, Jan 7, 2013 at 2:19 PM, Chan, Tim <[EMAIL PROTECTED]> wrote: > > > > > Is it possible to declare a schema when doing a LOAD for data in which > > you > > > do not know the total number of columns? > > > > > > For instance. I know the data contains 6 or more columns. These columns > > are > > > of the same data type. > > > > > > I basically want to join this data with another data set, but I was > > getting > > > the following error: > > > > > > ERROR 1109: Input (six_month_and_variable_month_sales) on which outer > > > join is desired should have a valid schema > > > > > > > > > > > -- > > -- Jinyuan (Jack) Zhou > > >
-- -- Jinyuan (Jack) Zhou
+
Jinyuan Zhou 2013-01-08, 02:48
|
|