I am new to this list. I tried to solve this problem for the last 48h but I am stuck. I hope someone here can hint me in the right direction.

I have problems using the Pig JsonLoader and wondering if I do something wrong or I encounter another problem.

The 1st half of this post is to show I know a at least something about what I am talking and that I did my homework. During research I found a lot about elephant-bird but there seems to be a conflict with cloudera. This way I am stuck as well. If you trust me already you can directly jump to the 2nd half of my post ,-).

The desired solution should work both, in Cloudera and on Amazon EMR.

To proof something works.

I have this data file:

```

$ cat a.json

{"DataASet":{"A1":1,"A2":4,"DataBSets":[{"B1":"1","B2":"1"},{"B1":"2","B2":"2"}]}}

$ ./jq '.' a.json

{

"DataASet": {

"A1": 1,

"A2": 4,

"DataBSets": [

{

"B1": "1",

"B2": "1"

},

{

"B1": "2",

"B2": "2"

}

]

}

}

$

```

I am using this Pig Script to load it.

``` Pig

a = load 'a.json' using JsonLoader('

DataASet: (

A1:int,

A2:int,

DataBSets: {

(

B1:chararray,

B2:chararray

)

}

)

');

```

In grunt everything seems ok.

```

grunt> describe a;

a: {DataASet: (A1: int,A2: int,DataBSets: {(B1: chararray,B2: chararray)})}

grunt> dump a;

((1,4,{(1,1),(2,2)}))

grunt>

```

So far so good.

Real Problem

In fact my real data (Gigabytes) looks a little bit different. The array is in fact an array of an object.

```

$ ./jq '.' b.json

{

"DataASet": {

"A1": 1,

"A2": 4,

"DataBSets": [

{

"DataBSet": {

"B1": "1",

"B2": "1"

}

},

{

"DataBSet": {

"B1": "2",

"B2": "2"

}

}

]

}

}

$ cat b.json

{"DataASet":{"A1":1,"A2":4,"DataBSets":[{"DataBSet":{"B1":"1","B2":"1"}},{"DataBSet":{"B1":"2","B2":"2"}}]}}

$

```

I trying to load this json with the following schema:

``` Pig

b = load 'b.json' using JsonLoader('

DataASet: (

A1:int,

A2:int,

DataBSets: {

DataBSet: (

B1:chararray,

B2:chararray

)

}

)

');

```

Again it looks good so far in grunt.

```

grunt> describe b;

b: {DataASet: (A1: int,A2: int,DataBSets: {DataBSet: (B1: chararray,B2: chararray)})} ```

I expect someting like this when dumping b:

```

((1,4,{((1,1)),((2,2))}))

```

But I get this:

```

grunt> dump b;

()

grunt>

```

Obviously I am doing something wrong. An empty set hints in the direction that the schema does not match on the input line.

Any hints? Thanks in advance.

Kind regards.

Ralf