Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop Serialization: Avro

Copy link to this message
Re: Hadoop Serialization: Avro

Depending on the response you get here, you might also post the
question separately on avro-user.

On Sat, Nov 26, 2011 at 1:46 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote:
> Hey everyone,
> First time posting to the list. I'm currently writing a hadoop job that
> will run daily and whose output will be part of the part of the next day's
> input. Also, the output will potentially be read by other programs for
> later analysis.
> Since my program's output is used as part of the next day's input, it would
> be nice if it was stored in some binary format that is easy to read the
> next time around. But this format also needs to be readable by other
> outside programs, not necessarily written in Java. After searching for a
> while it seems that Avro is what I want to be using. In any case, I have
> been looking around for a while and I can't seem to find a single example
> of how to use Avro within a Hadoop job.
> It seems that in order to use Avro I need to change the io.serializations
> value, however I don't know which value should be specified. Furthermore, I
> found that there are classes Avro{Input,Output}Format but these use a
> series of other Avro classes which, as far as I understand, seem need the
> use of other Avro classes such as AvroWrapper, AvroKey, AvroValue, and as
> far as I am concerned Avro* (with * replaced with pretty much any Hadoop
> class name). It seems however that these are used so that the Avro format
> is used throughout the Hadoop process to pass objects around.
> I just want to use Avro to save my output and read it again as input next
> time around. So far I have been using SequenceFile{Input,Output}Format, and
> have implemented the Writable interface in the relevant classes, however
> this is not portable to other languages. Is there a way to use Avro without
> a substantial rewrite (using Avro* classes) of my Hadoop job? Thanks in
> advance,
> Best,
> -Leo
> --
> Leo Urbina
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Department of Mathematics