Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # dev - Exporting some fairly complex json to sql server


Copy link to this message
-
RE: Exporting some fairly complex json to sql server
Venkatesan Ranganathan 2013-01-24, 00:40
Hi Harish
May be you can do it in 2 steps,  an MR job to split your files into 2 separate files and then load them with SQOOP?   I don't think SQOOP today supports it.   You can also load to a one table only at a time.
Thanks
Venkat

> Date: Wed, 23 Jan 2013 18:21:24 -0500
> Subject: Exporting some fairly complex json to sql server
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> Hello,
>
> I've been playing with Sqoop, and it seems to fit my use case (to export
> some log data from HDFS to Microsoft SQL Server).
> A look at the documentation shows that sqoop will export/import between
> tables of similar schema. However, my data export is more complicated.
> Allow me to describe it. I have JSON strings stored in Hadoop Sequence
> files, with each string indexed by timestamp. Each JSON string is similar
> to the following:
>
> [
>     {
>         "Unique_Key": "123",
>         "Timestamp": "8948024",
>         "Inner_Array": [
>             {
>                 "Name": "XYZ1",
>                 "Value": "abc1"
>             },
>             {
>                 "Name": "XYZ2",
>                 "Value": "abc2"
>             }
>         ]
>     },
>     {
>         "Unique_Key": "456",
>         "Timestamp": "89489802",
>         "Inner_Array": [
>             {
>                 "Name": "JDJ1",
>                 "Value": "sfj"
>             }
>         ]
>     }
> ]
>
> Each string represents an array of objects, with the "Unique_Key" and
> "Timestamp" of each of these objects corresponding to a row in one SQL
> table (Let's call it Table A). Each object has inside it another
> "Inner_Array" - each element of this Inner_Array needs to go into another
> SQL table (Table B), and will be associated with the previous table using
> the Unique_Key as a foreign key.
>
> So, the schema of the two SQL tables will be:
>
> Table A:
> Unique_Key (Primary Key) | TimeStamp
>
> Table B:
> Unique_Key (Foreign Key) | Name | Value
>
> If I wanted to implement this functionality in Sqoop (placing nested JSON
> in multiple tables), it seems I would need to firstly implement a "JSON
> parser" in lib and add schema mapping specifications to the configuration.
> We would also need to provide an option for parser selection. Is there
> anything I am missing? Any comments? Is this functionality already being
> implemented by someone?
>
> Thanks for your patient reading,
> Harish