Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Is there a way to conditionally read Avro data?


+
Anna Lahoud 2013-08-16, 19:23
+
Eric Wasserman 2013-08-16, 22:47
Copy link to this message
-
Re: Is there a way to conditionally read Avro data?
What Eric suggests (reader schemas) would work, but may incur a double
read cost when you wish to proceed based on a positive condition met
by the specific read.

If this data is held, order-wise, early into the record, then perhaps
using a custom DatumReader implementation (that does the low level
deserialization) may work more effectively. You can pass a DatumReader
when constructing a DataFileReader - but its quite a long route to go
IMO.

On Sat, Aug 17, 2013 at 4:17 AM, Eric Wasserman <[EMAIL PROTECTED]> wrote:
> If you define you records like this (this is in the Avro IDL lang. for
> brevity)
>
> If you write your records with a schema like this:
>
>
> record R {
>
>     Header header;
>
>     Body body;
>
>   }
>
>
>
> Then you can read with a schema like this:
>
>
>   record RSansBody {
>
>     Header header;
>
>   }
>
>
> And the Avro libraries will read the header part (in which your "type" would
> reside) and effectively skip the body part.
>
> ________________________________
> From: Anna Lahoud <[EMAIL PROTECTED]>
> Sent: Friday, August 16, 2013 12:23 PM
> To: [EMAIL PROTECTED]
> Subject: Is there a way to conditionally read Avro data?
>
> I am wondering if there is a way that I can avoid reading all of an item in
> an Avro file, based on some of the data that I have already read. For
> instance, say I have a datum where I know that if it's 'type' value is a
> 'ComputerVirus', and that I do not want to touch the remaining fields. Is
> there a way to 'move on' and get the next datum, without touching the
> remainder of the scary datum? I would call it a 'conditional read' in that I
> only want to fully read the datum if the datum meets some criteria.
>
> Anna
>

--
Harsh J