|
|
-
Re: Defining collection items terminated by for a nested data typeSadananda Hegde 2012-10-01, 19:26
I tested the nesting with the following DDL.
CREATE TABLE test_tbl ( col1 STRING, col2: INT, col3 MAP<STRING, ARRAY<STRING>>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':' LINES TERMINATED BY '\n' STORED AS TEXTFILE; I was able to load and query the test_tbl with the following data text1|1|key1:value1a^Dvalue1b^Dvalue1c, key2:value2a^Dvalue2b That is: '|' as field delimiter ':' as map key - value separator ',' as item separator for key - value pair (i.e. key / value pair 1 and key/value pair 2 are separated by ',' '^D' as the item separator for the inside array elements As you can see, all defined delimiters work fine with level 1; but will use the hive's default separators (like ^D in this case) for level 2, 3 , etc. Obviously the control characters are hard to read and much more difficult to produce in the data file. What is the best option to be able to specify the readable characters for all levels? Is it possible to handle this kind of nested structures using SERDE Row format? Can some one please help me with the actual CREATE Table syntax with ROW FORMAT SERDE for example above? Thanks in advance for your help. Sadu On Fri, Sep 28, 2012 at 9:27 AM, Sadananda Hegde <[EMAIL PROTECTED]>wrote: > Thanks Manish. > > It's a good article; But it's still not clear to mehow you define when the > column is of nested type (like array of maps, maps or array, etc). > > Just a clarification on item 2 below. > > 2. **What would be the seperator for map elements?**** > > For Map element separator is “=” > > > '=' is the MAP key separator, what I mean is the item separator when the > map contains multiple key/value pairs like, > > (Key1=Value1; Key2=Value2; Key3=Value3....) > > > Here '=' is the key separator and ';' is the item separator. > > > I can handle the above example with COLLECTION ITEMS TERMINATED BY ';' > and MAP KEYS TERMINATED BY '=' if the element is of type MAP. The COLLECTION > ITEMS TERMINATED BY ',' works on all three data types ( maps, arrays, > struct) when they are by them selves. The problem is defining them for > nested structures. Because we need multiple separators: one separator for > array items and a different separator for map items defined within that > array, etc. > > > The default hive delimiters work just fine.The delimiters in that case > will be level1 will have '^A', level 2 '^B', level 3 '^C', etc; What I am > trying to do is to explicitly define them. The COLLECTION ITEMS > TERMINATED BY ',' statement addresses the first level (^A); but don't know > how to define the separators for other levels (to use instead of ^B, ^C, > etc). > > > Thanks, > > Sadu > > > > On Fri, Sep 28, 2012 at 1:28 AM, Manish.Bhoge <[EMAIL PROTECTED]>wrote: > >> Hi Sadu,**** >> >> ** ** >> >> See my answer below.**** >> >> ** ** >> >> Also this will help you to understand in detail about collection, MAP and >> Array.**** >> >> ** ** >> >> >> http://datumengineering.wordpress.com/2012/09/27/agility-in-hive-map-array-score-for-hive/ >> **** >> >> ** ** >> >> ** ** >> >> *From:* Sadananda Hegde [mailto:[EMAIL PROTECTED]] >> *Sent:* Friday, September 28, 2012 10:31 AM >> *To:* [EMAIL PROTECTED] >> *Subject:* Defining collection items terminated by for a nested data type >> **** >> >> ** ** >> >> How does "collection items terminated by" work on a nested structure? >> Say the table is created with the DDL:**** >> >> **** >> >> CREATE TABLE table_1(f1 int, f2 string, f3 array <struct <a string, b >> int, c map<string, string>>>) >> ROW FORMAT DELIMITED >> FIELDS TERMINATED BY '|' >> COLLECTION ITEMS TERMINATED BY ',' >> MAP KEYS TERMINATED BY '=' >> LINES TERMINATED BY '\'n' >> STORED AS TEXTFILE;**** >> >> **** >> >> I guess comma seperator wll be used for the items in the outer >> most structure (i.e. array). Is that true?**** >> >> Yes. Right, comma is a separator for array.**** |