Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Question regarding nested complex data type


+
neha 2013-06-20, 08:42
+
Stephen Sprague 2013-06-20, 14:32
+
neha 2013-06-20, 14:45
+
Stephen Sprague 2013-06-20, 14:56
+
Dean Wampler 2013-06-21, 02:00
+
Stephen Sprague 2013-06-21, 02:34
Copy link to this message
-
Re: Question regarding nested complex data type
;) I actually thought it was a clever choice on Hive's part. There's no
real need for the 2nd tier separators, despite the nested collections!

However, it's still tricky to know what Hive expects when you're generating
table data with other apps.

dean

On Thu, Jun 20, 2013 at 9:34 PM, Stephen Sprague <[EMAIL PROTECTED]> wrote:

> look at it the other around if you want.  knowing an array of a two
> element struct is topologically the same as a map - they  darn well better
> be the same. :)
>
>
>
> On Thu, Jun 20, 2013 at 7:00 PM, Dean Wampler <[EMAIL PROTECTED]>wrote:
>
>> It's not as "simple" as it seems, as I discovered yesterday, to my
>> surprise. I created a table like this:
>>
>> CREATE TABLE t (
>>   name STRING,
>>   stuff   ARRAY<STRUCT<foo:String, bar:INT>>);
>>
>> I then used an insert statement to see how Hive would store the records,
>> so I could populate the real table with another process. Hive used ^A for
>> the field separator, ^B for the collection separator, in this case, to
>> separate structs in the array, and ^C to separate the elements in each
>> struct, e.g.,:
>>
>> Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3
>>
>> In other words, the structure you would expect for this table:
>>
>> CREATE TABLE t (
>>   name STRING,
>>   stuff   MAP<String, INT>);
>>
>> We should have covered the permutations of nested structures in our book,
>> but we didn't It would be great to document them, for realz some where.
>>
>> dean
>>
>> On Thu, Jun 20, 2013 at 9:56 AM, Stephen Sprague <[EMAIL PROTECTED]>wrote:
>>
>>> you only get three.  field separator, array elements separator (aka
>>> collection delimiter), and map key/value separator (aka map key
>>> delimiter).
>>>
>>> when you  nest deeper then you gotta use the default '^D', '^E' etc for
>>> each level.  At least that's been my experience which i've found has worked
>>> successfully.
>>>
>>>
>>> On Thu, Jun 20, 2013 at 7:45 AM, neha <[EMAIL PROTECTED]> wrote:
>>>
>>>> Thanks a lot for your reply, Stephen.
>>>> To answer your question - I was not aware of the fact that we could use
>>>> delimiter (in my example, '|') for first level of nesting. I tried now and
>>>> it worked fine.
>>>>
>>>> My next question - Is there any way to provide delimiter in DDL for
>>>> second level of nesting?
>>>> Thanks again!!
>>>>
>>>>
>>>> On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> its all there in the documentation under "create table" and it seems
>>>>> you got everything right too except one little thing - in your second
>>>>> example there for 'sample data loaded' - instead of '^B' change that to
>>>>> '|'  and you should be good. That's the delimiter that separates your two
>>>>> array elements - ie collections.
>>>>>
>>>>> i guess the real question for me is when you say 'since there is no
>>>>> way to use given delimiter "|" ' what did you mean by that?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 20, 2013 at 1:42 AM, neha <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have 2 questions about complex data types in nested composition.
>>>>>>
>>>>>> 1 >> I did not find a way to provide delimiter information in DDL if
>>>>>> one or more column has nested array/struct. In this case, default delimiter
>>>>>> has to be used for complex type column.
>>>>>> Please let me know if this is a limitation as of now or I am missing
>>>>>> something.
>>>>>>
>>>>>> e.g.:
>>>>>> *DDL*:
>>>>>> hive> create table example(col1 int, col2
>>>>>> array<struct<st1:int,st2:string>>) row format delimited fields terminated
>>>>>> by ',';
>>>>>> OK
>>>>>> Time taken: 0.226 seconds
>>>>>>
>>>>>> *Sample data loaded:*
>>>>>> 1,1^Cstring1^B2^Cstring2
>>>>>>
>>>>>> *O/P:*
>>>>>> hive> select * from example;
>>>>>> OK
>>>>>> 1    [{"st1":1,"st2":"string1"},{"st1":2,"st2":"string2"}]
>>>>>> Time taken: 0.288 seconds
>>>>>>
>>>>>> 2 >> For the same DDL given above, if we provide clause* collection
>>>>>> items terminated by '|' *and still use default delimiters (since
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com