Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> AVRO -- hibernate integration?


Copy link to this message
-
Re: AVRO -- hibernate integration?


On 12/22/11 5:26 PM, "Yang" <[EMAIL PROTECTED]> wrote:

>I'm faced with this project:
>
>we have a legacy project, using hibernate, it's rather slow.
>
>so we want to load the data from memcached  instead, but since the
>objects are fairly complex,
>we need to serialize them before inserting into memcached.
>
>
>the problem is, that we need to make sure that the AVRO schema files that
>we use
>strictly mirror the domain object class definition, so we need to
>generate the AVRO schemas
>from the existing hibernate mapping files *.hbm.xml  or from the
>domain class java files.
>is there such a convertor? or other existing integration approaches
>between hibernate and AVRO?

Option 1: Try the Reflect API.  If your objects are simple enough this may
be all you need.  However, it may not be configurable enough depending on
your use cases.  If it is close, open tickets in JIRA for improvement to
AVRO.  Adding the features you need here may be the easiest way forward.
Option 2: Use the Specific API to hold your data, and make your Hibernate
bean getter/setters delegate to this hidden inner object.
>
>(kind of off-topic below)
>another way of integrating memcached and hibernate is to use a lib
>someone developed to use memcached as a cache provider,
>not sure it's mature enough; ehcache also has a "distributed" solution
>similar to memcached, but requires enterprise license.
>

In the current Avro Java implementation, there are three APIs for Avro:
* Generic, which can create Java objects from any Avro binary given the
corresponding schema, and serialize objects that conform to the
org.apache.avro.generic.IndexedRecord interface.
* Specific which generates Java source code based on an Avro schema, and
is often used with an encapsulating wrapper which may as well be a
Hibernate beans
* Reflect, which can serialize arbitrary Java objects but has a couple
limitations on how to map fields and object types to Avro schemas.

In all cases you will need to consider how you want to deal with schema
migration. Will it be a requirement to keep two instances of an object
type in memcached that are different versions of the schema (e.g. add or
remove a field with a 'hot' memcached holding on to old versions)? If so
you will need to have a schema meta-store of some sort along with a lookup
mechanism so that when you deserialize you know what version of the schema
was used when the binary was serialized. Avro requires the schema used to
serialize the data in order to read it, and it can transform it to appear
as if it conforms to a newer schema, within the guidelines of the avro
spec: http://avro.apache.org/docs/current/spec.html#Schema+Resolution .

-Scott

>
>Thanks
>Yang