Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: Defining collection items terminated by for a nested data type


Copy link to this message
-
Re: Defining collection items terminated by for a nested data type
I tested the nesting with the following DDL.

CREATE TABLE test_tbl ( col1 STRING, col2: INT,  col3 MAP<STRING,
ARRAY<STRING>>)
ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '|'
    COLLECTION ITEMS TERMINATED BY ','
    MAP KEYS TERMINATED BY ':'
    LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

I was able to load and query the test_tbl with the following data
text1|1|key1:value1a^Dvalue1b^Dvalue1c, key2:value2a^Dvalue2b

That is: '|' as field delimiter
            ':' as map key - value separator
           ',' as item separator for key - value pair (i.e. key / value
pair 1 and key/value pair 2 are separated by ','
           '^D' as the item separator for the inside array elements

As you can see, all defined delimiters work fine with level 1; but will use
the hive's default separators (like ^D in this case) for level 2, 3 ,
etc. Obviously
the control characters are hard to read and much more difficult to produce
in the data file. What is the best option to be able to specify the
readable characters for all levels?

Is it possible to handle this kind of nested structures using SERDE Row
format?  Can some one please help me with the actual CREATE Table syntax
with ROW FORMAT SERDE  for example above?

Thanks in advance for your help.

Sadu
On Fri, Sep 28, 2012 at 9:27 AM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:

> Thanks Manish.
>
> It's a good article; But it's still not clear to mehow you define when the
> column is of nested type (like array of maps, maps or array, etc).
>
> Just a clarification on item 2 below.
>
> 2.      **What would be the seperator for map elements?****
>
> For Map element separator is “=”
>
>
> '=' is the MAP key separator, what I mean is the item separator when the
> map contains multiple key/value pairs like,
>
>    (Key1=Value1; Key2=Value2; Key3=Value3....)
>
>
> Here '=' is the key separator and ';' is the item separator.
>
>
> I can handle the above example with  COLLECTION ITEMS TERMINATED BY ';'
> and MAP KEYS TERMINATED BY '=' if the element is of type MAP. The  COLLECTION
> ITEMS TERMINATED BY ',' works on all three data types ( maps, arrays,
> struct) when they are by them selves. The problem is defining them for
> nested structures. Because we need multiple separators: one separator for
> array items and a different separator for map items defined within that
> array, etc.
>
>
> The default hive delimiters work just fine.The delimiters in that case
> will be level1 will have '^A', level 2 '^B', level 3 '^C', etc; What I am
> trying to do is to explicitly define them. The COLLECTION ITEMS
> TERMINATED BY ',' statement addresses the first level (^A); but don't know
> how to define the separators for other levels (to use instead of ^B, ^C,
> etc).
>
>
> Thanks,
>
> Sadu
>
>
>
> On Fri, Sep 28, 2012 at 1:28 AM, Manish.Bhoge <[EMAIL PROTECTED]>wrote:
>
>> Hi Sadu,****
>>
>> ** **
>>
>> See my answer below.****
>>
>> ** **
>>
>> Also this will help you to understand in detail about collection, MAP and
>> Array.****
>>
>> ** **
>>
>>
>> http://datumengineering.wordpress.com/2012/09/27/agility-in-hive-map-array-score-for-hive/
>> ****
>>
>> ** **
>>
>> ** **
>>
>> *From:* Sadananda Hegde [mailto:[EMAIL PROTECTED]]
>> *Sent:* Friday, September 28, 2012 10:31 AM
>> *To:* [EMAIL PROTECTED]
>> *Subject:* Defining collection items terminated by for a nested data type
>> ****
>>
>> ** **
>>
>> How does "collection items terminated by" work  on a nested structure?
>> Say the  table is created with the DDL:****
>>
>>  ****
>>
>> CREATE TABLE table_1(f1 int, f2 string, f3  array <struct <a string, b
>> int, c map<string, string>>>)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '|'
>> COLLECTION ITEMS TERMINATED BY ','
>> MAP KEYS TERMINATED BY '='
>> LINES TERMINATED BY '\'n'
>> STORED AS TEXTFILE;****
>>
>>  ****
>>
>> I guess comma seperator wll be used for the items in the outer
>> most structure (i.e. array).  Is that true?****
>>
>> Yes. Right, comma is a separator for array.****