Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # dev >> Exporting some fairly complex json to sql server


+
Harish Mandala 2013-01-23, 23:21
Copy link to this message
-
RE: Exporting some fairly complex json to sql server
Hi Harish
May be you can do it in 2 steps,  an MR job to split your files into 2 separate files and then load them with SQOOP?   I don't think SQOOP today supports it.   You can also load to a one table only at a time.
Thanks
Venkat

> Date: Wed, 23 Jan 2013 18:21:24 -0500
> Subject: Exporting some fairly complex json to sql server
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> Hello,
>
> I've been playing with Sqoop, and it seems to fit my use case (to export
> some log data from HDFS to Microsoft SQL Server).
> A look at the documentation shows that sqoop will export/import between
> tables of similar schema. However, my data export is more complicated.
> Allow me to describe it. I have JSON strings stored in Hadoop Sequence
> files, with each string indexed by timestamp. Each JSON string is similar
> to the following:
>
> [
>     {
>         "Unique_Key": "123",
>         "Timestamp": "8948024",
>         "Inner_Array": [
>             {
>                 "Name": "XYZ1",
>                 "Value": "abc1"
>             },
>             {
>                 "Name": "XYZ2",
>                 "Value": "abc2"
>             }
>         ]
>     },
>     {
>         "Unique_Key": "456",
>         "Timestamp": "89489802",
>         "Inner_Array": [
>             {
>                 "Name": "JDJ1",
>                 "Value": "sfj"
>             }
>         ]
>     }
> ]
>
> Each string represents an array of objects, with the "Unique_Key" and
> "Timestamp" of each of these objects corresponding to a row in one SQL
> table (Let's call it Table A). Each object has inside it another
> "Inner_Array" - each element of this Inner_Array needs to go into another
> SQL table (Table B), and will be associated with the previous table using
> the Unique_Key as a foreign key.
>
> So, the schema of the two SQL tables will be:
>
> Table A:
> Unique_Key (Primary Key) | TimeStamp
>
> Table B:
> Unique_Key (Foreign Key) | Name | Value
>
> If I wanted to implement this functionality in Sqoop (placing nested JSON
> in multiple tables), it seems I would need to firstly implement a "JSON
> parser" in lib and add schema mapping specifications to the configuration.
> We would also need to provide an option for parser selection. Is there
> anything I am missing? Any comments? Is this functionality already being
> implemented by someone?
>
> Thanks for your patient reading,
> Harish
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB