Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Question regarding nested complex data type


Copy link to this message
-
Re: Question regarding nested complex data type
It's not as "simple" as it seems, as I discovered yesterday, to my
surprise. I created a table like this:

CREATE TABLE t (
  name STRING,
  stuff   ARRAY<STRUCT<foo:String, bar:INT>>);

I then used an insert statement to see how Hive would store the records, so
I could populate the real table with another process. Hive used ^A for the
field separator, ^B for the collection separator, in this case, to separate
structs in the array, and ^C to separate the elements in each struct, e.g.,:

Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3

In other words, the structure you would expect for this table:

CREATE TABLE t (
  name STRING,
  stuff   MAP<String, INT>);

We should have covered the permutations of nested structures in our book,
but we didn't It would be great to document them, for realz some where.

dean

On Thu, Jun 20, 2013 at 9:56 AM, Stephen Sprague <[EMAIL PROTECTED]> wrote:

> you only get three.  field separator, array elements separator (aka
> collection delimiter), and map key/value separator (aka map key
> delimiter).
>
> when you  nest deeper then you gotta use the default '^D', '^E' etc for
> each level.  At least that's been my experience which i've found has worked
> successfully.
>
>
> On Thu, Jun 20, 2013 at 7:45 AM, neha <[EMAIL PROTECTED]> wrote:
>
>> Thanks a lot for your reply, Stephen.
>> To answer your question - I was not aware of the fact that we could use
>> delimiter (in my example, '|') for first level of nesting. I tried now and
>> it worked fine.
>>
>> My next question - Is there any way to provide delimiter in DDL for
>> second level of nesting?
>> Thanks again!!
>>
>>
>> On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague <[EMAIL PROTECTED]>wrote:
>>
>>> its all there in the documentation under "create table" and it seems you
>>> got everything right too except one little thing - in your second example
>>> there for 'sample data loaded' - instead of '^B' change that to '|'  and
>>> you should be good. That's the delimiter that separates your two array
>>> elements - ie collections.
>>>
>>> i guess the real question for me is when you say 'since there is no way
>>> to use given delimiter "|" ' what did you mean by that?
>>>
>>>
>>>
>>> On Thu, Jun 20, 2013 at 1:42 AM, neha <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I have 2 questions about complex data types in nested composition.
>>>>
>>>> 1 >> I did not find a way to provide delimiter information in DDL if
>>>> one or more column has nested array/struct. In this case, default delimiter
>>>> has to be used for complex type column.
>>>> Please let me know if this is a limitation as of now or I am missing
>>>> something.
>>>>
>>>> e.g.:
>>>> *DDL*:
>>>> hive> create table example(col1 int, col2
>>>> array<struct<st1:int,st2:string>>) row format delimited fields terminated
>>>> by ',';
>>>> OK
>>>> Time taken: 0.226 seconds
>>>>
>>>> *Sample data loaded:*
>>>> 1,1^Cstring1^B2^Cstring2
>>>>
>>>> *O/P:*
>>>> hive> select * from example;
>>>> OK
>>>> 1    [{"st1":1,"st2":"string1"},{"st1":2,"st2":"string2"}]
>>>> Time taken: 0.288 seconds
>>>>
>>>> 2 >> For the same DDL given above, if we provide clause* collection
>>>> items terminated by '|' *and still use default delimiters (since there
>>>> is no way to use given delimiter '|') then the select query shows incorrect
>>>> data.
>>>> Please let me know if this is something expected.
>>>>
>>>> e.g.
>>>> *DDL*:
>>>> hive> create table example(col1 int, col2
>>>> array<struct<st1:int,st2:string>>) row format delimited fields terminated
>>>> by ',' collection items terminated by '|';
>>>> OK
>>>> Time taken: 0.175 seconds
>>>>
>>>> *Sample data loaded:*
>>>> 1,1^Cstring1^B2^Cstring2
>>>>
>>>> *O/P:
>>>> *hive> select * from
>>>> example;
>>>>
>>>> OK
>>>> 1    [{"st1":1,"st2":"string1\u00022"}]
>>>> Time taken: 0.141 seconds
>>>> **
>>>> Thanks & Regards.
>>>>
>>>
>>>
>>
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB