|
|
-
Exploding Array of Struct in Hive
Raihan Jamal 2012-07-18, 21:24
CREATE TABLE IF NOT EXISTS TestingTable2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '-'
collection items terminated by ','
map keys terminated by ':'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/rjamal/output2';
Below is the data in TestingTable2
*1345653-110909316904:1341894546,221065796761:1341887508*
I can explode the above data by using this below query and it works fine for above data-
*SELECT * FROM (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps FROM testingtable2 LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts;*
And I will get output like this which is fine-
1345653 110909316904 1341894546
1345653 221065796761 1341887508
But in some cases I have data in the table like this, timestamp appended by pound sign for same product_id-
*1345653-110909316904:1341894546#1341885695,221065796761:1341887508 #1341885453*
And I need output like this for above data using the HiveQL query-
1345653 110909316904 1341894546
1345653 110909316904 1341885695
1345653 221065796761 1341887508
1345653 221065796761 1341885453
Is this possible to do this somehow?
Any suggestions will be appreciated.
*Raihan Jamal*
-
Re: Exploding Array of Struct in Hive
Vijay 2012-07-19, 00:37
You basically need to do a nested-explode here.
SELECT user_id, product_id, prod_and_ts_split FROM ( SELECT * FROM (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps FROM testingtable2 LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts ) tt2 LATERAL VIEW explode(split(timestamps, "#")) exploded_table2 as prod_and_ts_split;
Thanks, Vijay
On Wed, Jul 18, 2012 at 2:24 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote: > CREATE TABLE IF NOT EXISTS TestingTable2 > > ( > > USER_ID BIGINT, > > PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>> > > ) ROW FORMAT > > DELIMITED FIELDS TERMINATED BY '-' > > collection items terminated by ',' > > map keys terminated by ':' > > LINES TERMINATED BY '\n' > > STORED AS TEXTFILE > > LOCATION '/user/rjamal/output2'; > > > > Below is the data in TestingTable2 > > > > 1345653-110909316904:1341894546,221065796761:1341887508 > > > > I can explode the above data by using this below query and it works fine for > above data- > > > > SELECT * FROM (select user_id, prod_and_ts.product_id as product_id, > prod_and_ts.timestamps as timestamps FROM testingtable2 LATERAL VIEW > explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts; > > > > And I will get output like this which is fine- > > > > 1345653 110909316904 1341894546 > > 1345653 221065796761 1341887508 > > > > But in some cases I have data in the table like this, timestamp appended by > pound sign for same product_id- > > > > 1345653-110909316904:1341894546#1341885695,221065796761:1341887508#1341885453 > > > > And I need output like this for above data using the HiveQL query- > > > > 1345653 110909316904 1341894546 > > 1345653 110909316904 1341885695 > > 1345653 221065796761 1341887508 > > 1345653 221065796761 1341885453 > > > > Is this possible to do this somehow? > > > Any suggestions will be appreciated. > > > > Raihan Jamal >
|
|