Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Question regarding nested complex data type


Copy link to this message
-
Re: Question regarding nested complex data type
Stephen Sprague 2013-06-21, 02:34
look at it the other around if you want.  knowing an array of a two element
struct is topologically the same as a map - they  darn well better be the
same. :)

On Thu, Jun 20, 2013 at 7:00 PM, Dean Wampler <[EMAIL PROTECTED]> wrote:

> It's not as "simple" as it seems, as I discovered yesterday, to my
> surprise. I created a table like this:
>
> CREATE TABLE t (
>   name STRING,
>   stuff   ARRAY<STRUCT<foo:String, bar:INT>>);
>
> I then used an insert statement to see how Hive would store the records,
> so I could populate the real table with another process. Hive used ^A for
> the field separator, ^B for the collection separator, in this case, to
> separate structs in the array, and ^C to separate the elements in each
> struct, e.g.,:
>
> Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3
>
> In other words, the structure you would expect for this table:
>
> CREATE TABLE t (
>   name STRING,
>   stuff   MAP<String, INT>);
>
> We should have covered the permutations of nested structures in our book,
> but we didn't It would be great to document them, for realz some where.
>
> dean
>
> On Thu, Jun 20, 2013 at 9:56 AM, Stephen Sprague <[EMAIL PROTECTED]>wrote:
>
>> you only get three.  field separator, array elements separator (aka
>> collection delimiter), and map key/value separator (aka map key
>> delimiter).
>>
>> when you  nest deeper then you gotta use the default '^D', '^E' etc for
>> each level.  At least that's been my experience which i've found has worked
>> successfully.
>>
>>
>> On Thu, Jun 20, 2013 at 7:45 AM, neha <[EMAIL PROTECTED]> wrote:
>>
>>> Thanks a lot for your reply, Stephen.
>>> To answer your question - I was not aware of the fact that we could use
>>> delimiter (in my example, '|') for first level of nesting. I tried now and
>>> it worked fine.
>>>
>>> My next question - Is there any way to provide delimiter in DDL for
>>> second level of nesting?
>>> Thanks again!!
>>>
>>>
>>> On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague <[EMAIL PROTECTED]>wrote:
>>>
>>>> its all there in the documentation under "create table" and it seems
>>>> you got everything right too except one little thing - in your second
>>>> example there for 'sample data loaded' - instead of '^B' change that to
>>>> '|'  and you should be good. That's the delimiter that separates your two
>>>> array elements - ie collections.
>>>>
>>>> i guess the real question for me is when you say 'since there is no way
>>>> to use given delimiter "|" ' what did you mean by that?
>>>>
>>>>
>>>>
>>>> On Thu, Jun 20, 2013 at 1:42 AM, neha <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I have 2 questions about complex data types in nested composition.
>>>>>
>>>>> 1 >> I did not find a way to provide delimiter information in DDL if
>>>>> one or more column has nested array/struct. In this case, default delimiter
>>>>> has to be used for complex type column.
>>>>> Please let me know if this is a limitation as of now or I am missing
>>>>> something.
>>>>>
>>>>> e.g.:
>>>>> *DDL*:
>>>>> hive> create table example(col1 int, col2
>>>>> array<struct<st1:int,st2:string>>) row format delimited fields terminated
>>>>> by ',';
>>>>> OK
>>>>> Time taken: 0.226 seconds
>>>>>
>>>>> *Sample data loaded:*
>>>>> 1,1^Cstring1^B2^Cstring2
>>>>>
>>>>> *O/P:*
>>>>> hive> select * from example;
>>>>> OK
>>>>> 1    [{"st1":1,"st2":"string1"},{"st1":2,"st2":"string2"}]
>>>>> Time taken: 0.288 seconds
>>>>>
>>>>> 2 >> For the same DDL given above, if we provide clause* collection
>>>>> items terminated by '|' *and still use default delimiters (since
>>>>> there is no way to use given delimiter '|') then the select query shows
>>>>> incorrect data.
>>>>> Please let me know if this is something expected.
>>>>>
>>>>> e.g.
>>>>> *DDL*:
>>>>> hive> create table example(col1 int, col2
>>>>> array<struct<st1:int,st2:string>>) row format delimited fields terminated
>>>>> by ',' collection items terminated by '|';
>>>>> OK
>>>>> Time taken: 0.175 seconds
>>>>>
>>>