Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Using Arrays in Apache Avro


Copy link to this message
-
Re: Using Arrays in Apache Avro
Mika Ristimaki 2013-09-24, 19:20

On Sep 24, 2013, at 9:46 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote:

> Thanks a lot Mika. Yeah, it works now but my second question is- Does the avro schema that I have made looks good as compared to JSON value that we were using previously?
> I thought we can use an array for that so designed like that using an Apache Avro..
>

This is an application design question, and not related to Avro. If you have a list of prices,  array is a good place to store them.

> And also why Avro Array uses java.util.List datatype? Just curious to know on that as well.

Someone who has actually designed Avro can answer this better, but I assume that List was chosen because it is much more convenient to use than java arrays. You don't need to know the size before hand, etc.

-Mika

>
> Thanks for the help.
>
>
>
>
>
>
>
> Raihan Jamal
>
>
> On Tue, Sep 24, 2013 at 11:40 AM, Mika Ristimaki <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Avro array uses java.util.List datatype. So you must do something like
>
> List<Double> nums = new ArrayList<Double>();
> nums.add(new Double(9.97));
> .
> .
>
> On Sep 24, 2013, at 9:02 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote:
>
>> Earlier, I was using JSON in our project so one of our attribute data looks like below in JSON format. Below is the attribute `e3` data in JSON format.
>>
>> {"lv":[{"v":{"prc":9.97}},{"v":{"prc":5.56}},{"v":{"prc":21.48}}]}
>>
>> Now, I am planning to use Apache Avro for our Data Serialization format. So I decided to design the Avro schema for the above attributes data. And I came up with the below design.
>>  
>>   {
>>      "namespace": "com.avro.test.AvroExperiment",
>>      "type": "record",
>>      "name": "AVG_PRICE",
>>      "doc": "AVG_PRICE data",
>>      "fields": [
>>          {"name": "prc", "type": {"type": "array", "items": "double"}}
>>      ]
>>     }
>>
>> Now, I am not sure whether the above schema looks right or not corresponding to the values I have in JSON? Can anyone help me on that? Assuming the above schema looks correct, if I try to serialize the data using the above avro schema, I always get the below error-
>>  
>> double[] nums = new double[] { 9.97, 5.56, 21.48 };
>>
>> Schema schema = new Parser().parse((AvroExperiment.class.getResourceAsStream("/aspmc.avsc")));
>> GenericRecord record = new GenericData.Record(schema);
>> record.put("prc", nums);
>>
>> GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
>> ByteArrayOutputStream os = new ByteArrayOutputStream();
>>
>> Encoder e = EncoderFactory.get().binaryEncoder(os, null);
>>
>> // this line gives me exception..
>> writer.write(record, e);
>>
>> Below is the exception, I always get-
>>
>>     Exception in thread "main" java.lang.ClassCastException: [D incompatible with java.util.Collection
>>
>> Any idea what wrong I am doing here?
>
>