Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> An array and a map in the same Hive table: Can Separator for Map KV pairs be different than Separator for Array elements?


Copy link to this message
-
Re: An array and a map in the same Hive table: Can Separator for Map KV pairs be different than Separator for Array elements?
Hi Mark,

Collection items terminated by applies to both maps and arrays. In your
case, you can play with hive's nested complex data structures (so that you
can introduce another separator) to deserialize your data but that would
require some experimentation (digging into code). This would be non-trivial.

The simplest way would be to specify -
collection items terminated by '\&'
   map keys terminated by '=' ;
in table creation and parse the array field by using split udf in hive.
(This would even work if the array field does not have '&' in it). But, all
the users of this table need to know about this.

In other words,
Create table mark_test
Row_num int,
Tags            string,
Keys            map<string, string>

select split(Tags, ',') from mark_test ...

Hope it helps.

~Aniket
On Thu, Jun 14, 2012 at 12:26 PM, Sunderlin, Mark <
[EMAIL PROTECTED]> wrote:

> If my data has three columns and a typical row looks like:
>
> 5754^E
> ContentQuality5,Knowledge,Knowledge/Nature,UnFlagged,EarthReport^EdisplayHeight=293&displayWidth=570&imid=09177970492035608320&sid=577&skey=63&videoid=506875580
>
> I have an integer, an array, and a map.
> Columns separator is a Control E (^E)
> Array elements are separated by a comma (,)
> Map key/value pairs are separated by a ampersand (&), and keys are
> separated from values by the equals sign (=)
>
> Pretty sure I want this create:
>
> Create table mark_test
> Row_num int,
> Tags            array<string>,
> Keys            map<string, string>
>
> row format delimited
>    fields terminated by '\005'  -- Control E
>    collection items terminated by '\&'
>    map keys terminated by '=' ;
>
> Question:
> Does the 'Collection items terminated by' apply to just the map, or does
> it also set the item terminator for my array?
> If no
>        Great! Life is good for me!
> If yes
>        Ugh.  Can I have some way have a separate item terminator for the
> array and the map or do I need to manipulate the data before loading to get
> the map and array's item terminator to be the same?
>
>
> ---
> Mark E. Sunderlin
> Solutions Architect   |AOL Core Data Technologies
> P: 703-265-6935       |C: 540-327-6222 | AIM: MESunderlin
> 22000 AOL Way,  Dulles, VA  20166
>
>
>
--
"...:::Aniket:::... Quetzalco@tl"
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB