Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Is there a way to conditionally read Avro data?


+
Anna Lahoud 2013-08-16, 19:23
+
Eric Wasserman 2013-08-16, 22:47
Copy link to this message
-
Re: Is there a way to conditionally read Avro data?
What Eric suggests (reader schemas) would work, but may incur a double
read cost when you wish to proceed based on a positive condition met
by the specific read.

If this data is held, order-wise, early into the record, then perhaps
using a custom DatumReader implementation (that does the low level
deserialization) may work more effectively. You can pass a DatumReader
when constructing a DataFileReader - but its quite a long route to go
IMO.

On Sat, Aug 17, 2013 at 4:17 AM, Eric Wasserman <[EMAIL PROTECTED]> wrote:
> If you define you records like this (this is in the Avro IDL lang. for
> brevity)
>
> If you write your records with a schema like this:
>
>
> record R {
>
>     Header header;
>
>     Body body;
>
>   }
>
>
>
> Then you can read with a schema like this:
>
>
>   record RSansBody {
>
>     Header header;
>
>   }
>
>
> And the Avro libraries will read the header part (in which your "type" would
> reside) and effectively skip the body part.
>
> ________________________________
> From: Anna Lahoud <[EMAIL PROTECTED]>
> Sent: Friday, August 16, 2013 12:23 PM
> To: [EMAIL PROTECTED]
> Subject: Is there a way to conditionally read Avro data?
>
> I am wondering if there is a way that I can avoid reading all of an item in
> an Avro file, based on some of the data that I have already read. For
> instance, say I have a datum where I know that if it's 'type' value is a
> 'ComputerVirus', and that I do not want to touch the remaining fields. Is
> there a way to 'move on' and get the next datum, without touching the
> remainder of the scary datum? I would call it a 'conditional read' in that I
> only want to fully read the datum if the datum meets some criteria.
>
> Anna
>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB