|
|
-
Convergence on File Format?
Michal Klos 2012-03-08, 23:07
Hi,
It seems that Avro is poised to become "the" file format, is that still the case?
We've looked at Text, RCFile and Avro. Text is nice, but we'd really need to extend it. RCFile is great for Hive, but it has been a challenge using it outside of Hive. Avro has a great feature set, but is comparably (to RCFile) significantly slower and larger on disk in our testing, but if it has the highest rate of development, it may be the right choice.
If you were choosing a File Format today to build a general purpose cluster (general purpose in the sense of using all the Hadoop tools, not just Hive), what would you choose? (one of the choices being development of a Custom format)
Thanks,
Mike
-
Re: Convergence on File Format?
Serge Blazhievsky 2012-03-08, 23:10
We started using Avro few month ago and results are great!
Easy to use, reliable, feature rich, great integration with MapReduce
On 3/8/12 3:07 PM, "Michal Klos" <[EMAIL PROTECTED]> wrote:
>Hi, > >It seems that Avro is poised to become "the" file format, is that still >the case? > >We've looked at Text, RCFile and Avro. Text is nice, but we'd really need >to extend it. RCFile is great for Hive, but it has been a challenge using >it outside of Hive. Avro has a great feature set, but is comparably (to >RCFile) significantly slower and larger on disk in our testing, but if it >has the highest rate of development, it may be the right choice. > >If you were choosing a File Format today to build a general purpose >cluster (general purpose in the sense of using all the Hadoop tools, not >just Hive), what would you choose? (one of the choices being development >of a Custom format) > >Thanks, > >Mike >
-
Re: Convergence on File Format?
Russell Jurney 2012-03-09, 00:01
Avro support in Pig will be fairly mature in 0.10.
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
On Mar 8, 2012, at 3:10 PM, Serge Blazhievsky <[EMAIL PROTECTED]> wrote:
> We started using Avro few month ago and results are great! > > Easy to use, reliable, feature rich, great integration with MapReduce > > On 3/8/12 3:07 PM, "Michal Klos" <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> It seems that Avro is poised to become "the" file format, is that still >> the case? >> >> We've looked at Text, RCFile and Avro. Text is nice, but we'd really need >> to extend it. RCFile is great for Hive, but it has been a challenge using >> it outside of Hive. Avro has a great feature set, but is comparably (to >> RCFile) significantly slower and larger on disk in our testing, but if it >> has the highest rate of development, it may be the right choice. >> >> If you were choosing a File Format today to build a general purpose >> cluster (general purpose in the sense of using all the Hadoop tools, not >> just Hive), what would you choose? (one of the choices being development >> of a Custom format) >> >> Thanks, >> >> Mike >> >
|
|