Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> get_json_object cannot handle array


Copy link to this message
-
RE: get_json_object cannot handle array
That doesn't sound like it would work, because explode takes a Hive array, not a string that represents a JSON array. Corect me if I'm wrong.
From: Ning Zhang [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, August 18, 2010 5:41 PM
To: <[EMAIL PROTECTED]>
Subject: Re: get_json_object cannot handle array

Definitely. You can extend get_json_object() to support arbitrary JSON values.

BTW, for your particular example, if your JSOn value is always an array of JSON object, you can compile get_json_object with the UDTF explode:

select get_json_object(B.jobj, '$...)
from T lateral view explode(T.json_array) B as jobj
...
On Aug 18, 2010, at 5:30 PM, Steven Wong wrote:
Thanks for the explanation. Basically, get_json_object's json_txt argument must be a JSON object (which is "an unordered set of name/value pairs"), not just any JSON value (string, number, object, array, true, false, null).

I would think that it'll be more convenient to allow the json_txt argument to be any JSON value, as an enhancement. Yes? No?
From: Ning Zhang [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, August 18, 2010 5:02 PM
To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: get_json_object cannot handle array

The get_json_object supports a restricted version of JSON object and path expressions. It supports arrays at non-root level.  You can see the details by hive> describe function extended get_json_object;

On Aug 18, 2010, at 3:07 PM, Steven Wong wrote:

I have a string column that contains a JSON array on each row. Calling the function get_json_object on that column always returns NULL:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
[{"id":9,"ip":"173.223.232.50","rtt":71,"bw":41943}] NULL NULL NULL

If I replace the array with its 0th element, then get_json_object works:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
{"id":9,"ip":"173.223.232.32","rtt":71,"bw":4211}     {"id":9,"bw":4211,"rtt":71,"ip":"173.223.232.32"}    NULL 71

Is it a get_json_object bug in handling arrays, or am I doing something wrong? Hive version is 0.5.

Thanks.
Steven

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB