Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # dev - Exporting some fairly complex json to sql server


Copy link to this message
-
Exporting some fairly complex json to sql server
Harish Mandala 2013-01-23, 23:21
Hello,

I've been playing with Sqoop, and it seems to fit my use case (to export
some log data from HDFS to Microsoft SQL Server).
A look at the documentation shows that sqoop will export/import between
tables of similar schema. However, my data export is more complicated.
Allow me to describe it. I have JSON strings stored in Hadoop Sequence
files, with each string indexed by timestamp. Each JSON string is similar
to the following:

[
    {
        "Unique_Key": "123",
        "Timestamp": "8948024",
        "Inner_Array": [
            {
                "Name": "XYZ1",
                "Value": "abc1"
            },
            {
                "Name": "XYZ2",
                "Value": "abc2"
            }
        ]
    },
    {
        "Unique_Key": "456",
        "Timestamp": "89489802",
        "Inner_Array": [
            {
                "Name": "JDJ1",
                "Value": "sfj"
            }
        ]
    }
]

Each string represents an array of objects, with the "Unique_Key" and
"Timestamp" of each of these objects corresponding to a row in one SQL
table (Let's call it Table A). Each object has inside it another
"Inner_Array" - each element of this Inner_Array needs to go into another
SQL table (Table B), and will be associated with the previous table using
the Unique_Key as a foreign key.

So, the schema of the two SQL tables will be:

Table A:
Unique_Key (Primary Key) | TimeStamp

Table B:
Unique_Key (Foreign Key) | Name | Value

If I wanted to implement this functionality in Sqoop (placing nested JSON
in multiple tables), it seems I would need to firstly implement a "JSON
parser" in lib and add schema mapping specifications to the configuration.
We would also need to provide an option for parser selection. Is there
anything I am missing? Any comments? Is this functionality already being
implemented by someone?

Thanks for your patient reading,
Harish