Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # dev >> Exporting some fairly complex json to sql server

Copy link to this message
Exporting some fairly complex json to sql server

I've been playing with Sqoop, and it seems to fit my use case (to export
some log data from HDFS to Microsoft SQL Server).
A look at the documentation shows that sqoop will export/import between
tables of similar schema. However, my data export is more complicated.
Allow me to describe it. I have JSON strings stored in Hadoop Sequence
files, with each string indexed by timestamp. Each JSON string is similar
to the following:

        "Unique_Key": "123",
        "Timestamp": "8948024",
        "Inner_Array": [
                "Name": "XYZ1",
                "Value": "abc1"
                "Name": "XYZ2",
                "Value": "abc2"
        "Unique_Key": "456",
        "Timestamp": "89489802",
        "Inner_Array": [
                "Name": "JDJ1",
                "Value": "sfj"

Each string represents an array of objects, with the "Unique_Key" and
"Timestamp" of each of these objects corresponding to a row in one SQL
table (Let's call it Table A). Each object has inside it another
"Inner_Array" - each element of this Inner_Array needs to go into another
SQL table (Table B), and will be associated with the previous table using
the Unique_Key as a foreign key.

So, the schema of the two SQL tables will be:

Table A:
Unique_Key (Primary Key) | TimeStamp

Table B:
Unique_Key (Foreign Key) | Name | Value

If I wanted to implement this functionality in Sqoop (placing nested JSON
in multiple tables), it seems I would need to firstly implement a "JSON
parser" in lib and add schema mapping specifications to the configuration.
We would also need to provide an option for parser selection. Is there
anything I am missing? Any comments? Is this functionality already being
implemented by someone?

Thanks for your patient reading,
Venkatesan Ranganathan 2013-01-24, 00:40