Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Question regarding nested complex data type


Copy link to this message
-
Re: Question regarding nested complex data type
Stephen Sprague 2013-06-20, 14:56
you only get three.  field separator, array elements separator (aka
collection delimiter), and map key/value separator (aka map key
delimiter).

when you  nest deeper then you gotta use the default '^D', '^E' etc for
each level.  At least that's been my experience which i've found has worked
successfully.
On Thu, Jun 20, 2013 at 7:45 AM, neha <[EMAIL PROTECTED]> wrote:

> Thanks a lot for your reply, Stephen.
> To answer your question - I was not aware of the fact that we could use
> delimiter (in my example, '|') for first level of nesting. I tried now and
> it worked fine.
>
> My next question - Is there any way to provide delimiter in DDL for second
> level of nesting?
> Thanks again!!
>
>
> On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague <[EMAIL PROTECTED]>wrote:
>
>> its all there in the documentation under "create table" and it seems you
>> got everything right too except one little thing - in your second example
>> there for 'sample data loaded' - instead of '^B' change that to '|'  and
>> you should be good. That's the delimiter that separates your two array
>> elements - ie collections.
>>
>> i guess the real question for me is when you say 'since there is no way
>> to use given delimiter "|" ' what did you mean by that?
>>
>>
>>
>> On Thu, Jun 20, 2013 at 1:42 AM, neha <[EMAIL PROTECTED]> wrote:
>>
>>> Hi All,
>>>
>>> I have 2 questions about complex data types in nested composition.
>>>
>>> 1 >> I did not find a way to provide delimiter information in DDL if one
>>> or more column has nested array/struct. In this case, default delimiter has
>>> to be used for complex type column.
>>> Please let me know if this is a limitation as of now or I am missing
>>> something.
>>>
>>> e.g.:
>>> *DDL*:
>>> hive> create table example(col1 int, col2
>>> array<struct<st1:int,st2:string>>) row format delimited fields terminated
>>> by ',';
>>> OK
>>> Time taken: 0.226 seconds
>>>
>>> *Sample data loaded:*
>>> 1,1^Cstring1^B2^Cstring2
>>>
>>> *O/P:*
>>> hive> select * from example;
>>> OK
>>> 1    [{"st1":1,"st2":"string1"},{"st1":2,"st2":"string2"}]
>>> Time taken: 0.288 seconds
>>>
>>> 2 >> For the same DDL given above, if we provide clause* collection
>>> items terminated by '|' *and still use default delimiters (since there
>>> is no way to use given delimiter '|') then the select query shows incorrect
>>> data.
>>> Please let me know if this is something expected.
>>>
>>> e.g.
>>> *DDL*:
>>> hive> create table example(col1 int, col2
>>> array<struct<st1:int,st2:string>>) row format delimited fields terminated
>>> by ',' collection items terminated by '|';
>>> OK
>>> Time taken: 0.175 seconds
>>>
>>> *Sample data loaded:*
>>> 1,1^Cstring1^B2^Cstring2
>>>
>>> *O/P:
>>> *hive> select * from
>>> example;
>>>
>>> OK
>>> 1    [{"st1":1,"st2":"string1\u00022"}]
>>> Time taken: 0.141 seconds
>>> **
>>> Thanks & Regards.
>>>
>>
>>
>