There's lots of possibilities with Hive and JSON (as well as with HBase too
Since you are just starting out by the sounds of it i - and this is only my
opinion - would recommend to store your however deeply nested json objects
one per line in a hive table.
Then look at the json_tuple() and get_json_object() hive udf functions to
pull values out and do what you want with them.
After you've become used to the hive paradigm and all that then you might
want to look at using something called a "JsonSerde" to map the json
directly to columns in a table but like i said try the way i described
above first to familiarize yourself with the Hive.
Ya gotta crawl first before you run.
On Fri, Jun 7, 2013 at 12:24 AM, Michael Duergner | Pockets United GmbH <
[EMAIL PROTECTED]> wrote:
> Hi there,
> I'm looking if we can use Hive to run our usage analytics; our system
> right now collects data from our clients in JSON format which results in
> multiple files per client (every time analytics events are uploaded to the
> server a new file is created) which is in JSON format; each file has one
> JSON array with multiple JSON objects representing the actual analytics
> From what I understood from the docs so far, Hive should be able to with
> with JSON data; the only difference our data has compared to the data I saw
> in several examples is, that the actual entries are inside an array instead
> of being single lines in this file.
> Can I process them directly or do I need to write some custom code to
> transform the input data?
> *Michael Dürgner*
> Founder & CTO
> Pockets United GmbH****
> email [EMAIL PROTECTED]
> phone +49 89 2155 6166-1
> mobile +49 151 42 31 46 40 (time: CET/UTC+1h)
> mail Dachauerstr. 241, 80637 Munich, Germany****
> office Wayra Akademie, Kaufingerstr. 15, 80331 Munich, Germany
> **Split Costs, Share Fun!*****
> Managing Directors: Michael Duergner, Matthias Schicker und Markus Stiefel
> Location and Municipal Court: Munich HRB 192066
> VAT: DE277893196