|
|
-
Avro file size is too big
Ruslan Al-Fakikh 2012-07-04, 13:32
Hello,
In my organization currently we are evaluating Avro as a format. Our concern is file size. I've done some comparisons of a piece of our data. Say we have sequence files, compressed. The payload (values) are just lines. As far as I know we use line number as keys and we use the default codec for compression inside sequence files. The size is 1.6G, when I put it to avro with deflate codec with deflate level 9 it becomes 2.2G. This is interesting, because the values in seq files are just string, but Avro has a normal schema with primitive types. And those are kept binary. Shouldn't Avro be less in size? Also I took another dataset which is 28G (gzip files, plain tab-delimited text, don't know what is the deflate level) and put it to Avro and it became 38G Why Avro is so big in size? Am I missing some size optimization?
Thanks in advance!
-
Re: Avro file size is too big
Russell Jurney 2012-07-04, 21:58
This thread looks useful. Are you flushing too often? http://apache-avro.679487.n3.nabble.com/avro-compression-using-snappy-and-deflate-td3870167.htmlRussell Jurney http://datasyndrome.comOn Jul 4, 2012, at 6:33 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: > Hello, > > In my organization currently we are evaluating Avro as a format. Our > concern is file size. I've done some comparisons of a piece of our > data. > Say we have sequence files, compressed. The payload (values) are just > lines. As far as I know we use line number as keys and we use the > default codec for compression inside sequence files. The size is 1.6G, > when I put it to avro with deflate codec with deflate level 9 it > becomes 2.2G. > This is interesting, because the values in seq files are just string, > but Avro has a normal schema with primitive types. And those are kept > binary. Shouldn't Avro be less in size? > Also I took another dataset which is 28G (gzip files, plain > tab-delimited text, don't know what is the deflate level) and put it > to Avro and it became 38G > Why Avro is so big in size? Am I missing some size optimization? > > Thanks in advance!
-
Re: Avro file size is too big
Ruslan Al-Fakikh 2012-07-05, 14:53
Hi Russell, I am not aware what flushing is. I am creating Avro files from Pig and from Hive (and having basically the same results). I've already saw that post, but my question differs. That guy had 40G of raw text, then after he RESOLVED his problem he got 4.5G of Avro with deflate codec. So now he has 8.8X compression. My results are even better from the beginning. I have 27G of raw text, then I have 2.2G of Avro with deflate codec (deflate level 9), so my compression is 12X But my question is why gzip files and sequence files (with the default codec) are 0.72 smaller than Avro files with deflate level 9? Thanks On Thu, Jul 5, 2012 at 1:58 AM, Russell Jurney <[EMAIL PROTECTED]> wrote: > This thread looks useful. Are you flushing too often? > http://apache-avro.679487.n3.nabble.com/avro-compression-using-snappy-and-deflate-td3870167.html> > Russell Jurney http://datasyndrome.com> > On Jul 4, 2012, at 6:33 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> In my organization currently we are evaluating Avro as a format. Our >> concern is file size. I've done some comparisons of a piece of our >> data. >> Say we have sequence files, compressed. The payload (values) are just >> lines. As far as I know we use line number as keys and we use the >> default codec for compression inside sequence files. The size is 1.6G, >> when I put it to avro with deflate codec with deflate level 9 it >> becomes 2.2G. >> This is interesting, because the values in seq files are just string, >> but Avro has a normal schema with primitive types. And those are kept >> binary. Shouldn't Avro be less in size? >> Also I took another dataset which is 28G (gzip files, plain >> tab-delimited text, don't know what is the deflate level) and put it >> to Avro and it became 38G >> Why Avro is so big in size? Am I missing some size optimization? >> >> Thanks in advance!
-
Re: Avro file size is too big
Doug Cutting 2012-07-05, 17:24
Rusian,
This is unexpected. Perhaps we can understand it if we have more information.
What Writable class are you using for keys and values in the SequenceFile?
What schema are you using in the Avro data file?
Can you provide small sample files of each and/or code that will reproduce this?
Thanks,
Doug
On Wed, Jul 4, 2012 at 6:32 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: > Hello, > > In my organization currently we are evaluating Avro as a format. Our > concern is file size. I've done some comparisons of a piece of our > data. > Say we have sequence files, compressed. The payload (values) are just > lines. As far as I know we use line number as keys and we use the > default codec for compression inside sequence files. The size is 1.6G, > when I put it to avro with deflate codec with deflate level 9 it > becomes 2.2G. > This is interesting, because the values in seq files are just string, > but Avro has a normal schema with primitive types. And those are kept > binary. Shouldn't Avro be less in size? > Also I took another dataset which is 28G (gzip files, plain > tab-delimited text, don't know what is the deflate level) and put it > to Avro and it became 38G > Why Avro is so big in size? Am I missing some size optimization? > > Thanks in advance!
-
Re: Avro file size is too big
Ruslan Al-Fakikh 2012-07-05, 22:11
Hey Doug, Here is a little more of explanation http://mail-archives.apache.org/mod_mbox/avro-user/201207.mbox/%3CCACBYqwQWPaj8NaGVTOir4dO%2BOqri-UM-8RQ-5Uu2r2bLCyuBTA%40mail.gmail.com%3EI'll answer your questions later after some investigation Thank you! On Thu, Jul 5, 2012 at 9:24 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Rusian, > > This is unexpected. Perhaps we can understand it if we have more information. > > What Writable class are you using for keys and values in the SequenceFile? > > What schema are you using in the Avro data file? > > Can you provide small sample files of each and/or code that will reproduce this? > > Thanks, > > Doug > > On Wed, Jul 4, 2012 at 6:32 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: >> Hello, >> >> In my organization currently we are evaluating Avro as a format. Our >> concern is file size. I've done some comparisons of a piece of our >> data. >> Say we have sequence files, compressed. The payload (values) are just >> lines. As far as I know we use line number as keys and we use the >> default codec for compression inside sequence files. The size is 1.6G, >> when I put it to avro with deflate codec with deflate level 9 it >> becomes 2.2G. >> This is interesting, because the values in seq files are just string, >> but Avro has a normal schema with primitive types. And those are kept >> binary. Shouldn't Avro be less in size? >> Also I took another dataset which is 28G (gzip files, plain >> tab-delimited text, don't know what is the deflate level) and put it >> to Avro and it became 38G >> Why Avro is so big in size? Am I missing some size optimization? >> >> Thanks in advance!
-
Re: Avro file size is too big
Doug Cutting 2012-07-05, 22:19
You can use the Avro command-line tool to dump the metadata, which will show the schema and codec: java -jar avro-tools.jar getmeta <file> Doug On Thu, Jul 5, 2012 at 3:11 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: > Hey Doug, > > Here is a little more of explanation > http://mail-archives.apache.org/mod_mbox/avro-user/201207.mbox/%3CCACBYqwQWPaj8NaGVTOir4dO%2BOqri-UM-8RQ-5Uu2r2bLCyuBTA%40mail.gmail.com%3E> I'll answer your questions later after some investigation > > Thank you! > > > On Thu, Jul 5, 2012 at 9:24 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> Rusian, >> >> This is unexpected. Perhaps we can understand it if we have more information. >> >> What Writable class are you using for keys and values in the SequenceFile? >> >> What schema are you using in the Avro data file? >> >> Can you provide small sample files of each and/or code that will reproduce this? >> >> Thanks, >> >> Doug >> >> On Wed, Jul 4, 2012 at 6:32 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: >>> Hello, >>> >>> In my organization currently we are evaluating Avro as a format. Our >>> concern is file size. I've done some comparisons of a piece of our >>> data. >>> Say we have sequence files, compressed. The payload (values) are just >>> lines. As far as I know we use line number as keys and we use the >>> default codec for compression inside sequence files. The size is 1.6G, >>> when I put it to avro with deflate codec with deflate level 9 it >>> becomes 2.2G. >>> This is interesting, because the values in seq files are just string, >>> but Avro has a normal schema with primitive types. And those are kept >>> binary. Shouldn't Avro be less in size? >>> Also I took another dataset which is 28G (gzip files, plain >>> tab-delimited text, don't know what is the deflate level) and put it >>> to Avro and it became 38G >>> Why Avro is so big in size? Am I missing some size optimization? >>> >>> Thanks in advance!
-
Re: Avro file size is too big
Ey-Chih chow 2012-07-18, 23:59
We are converting our compression scheme from gzip to snappy for our json logs. In one case, the size of a gzip file is 715MB and the corresponding snappy file is 1.885GB. The schema of the snappy file is "bytes". In other words, we compress line by line of our json logs and each line is a json string. Is there any way we can optimize our compression with snappy? Ey-Chih Chow On Jul 5, 2012, at 3:19 PM, Doug Cutting wrote: > You can use the Avro command-line tool to dump the metadata, which > will show the schema and codec: > > java -jar avro-tools.jar getmeta <file> > > Doug > > On Thu, Jul 5, 2012 at 3:11 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: >> Hey Doug, >> >> Here is a little more of explanation >> http://mail-archives.apache.org/mod_mbox/avro-user/201207.mbox/%3CCACBYqwQWPaj8NaGVTOir4dO%2BOqri-UM-8RQ-5Uu2r2bLCyuBTA%40mail.gmail.com%3E>> I'll answer your questions later after some investigation >> >> Thank you! >> >> >> On Thu, Jul 5, 2012 at 9:24 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >>> Rusian, >>> >>> This is unexpected. Perhaps we can understand it if we have more information. >>> >>> What Writable class are you using for keys and values in the SequenceFile? >>> >>> What schema are you using in the Avro data file? >>> >>> Can you provide small sample files of each and/or code that will reproduce this? >>> >>> Thanks, >>> >>> Doug >>> >>> On Wed, Jul 4, 2012 at 6:32 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: >>>> Hello, >>>> >>>> In my organization currently we are evaluating Avro as a format. Our >>>> concern is file size. I've done some comparisons of a piece of our >>>> data. >>>> Say we have sequence files, compressed. The payload (values) are just >>>> lines. As far as I know we use line number as keys and we use the >>>> default codec for compression inside sequence files. The size is 1.6G, >>>> when I put it to avro with deflate codec with deflate level 9 it >>>> becomes 2.2G. >>>> This is interesting, because the values in seq files are just string, >>>> but Avro has a normal schema with primitive types. And those are kept >>>> binary. Shouldn't Avro be less in size? >>>> Also I took another dataset which is 28G (gzip files, plain >>>> tab-delimited text, don't know what is the deflate level) and put it >>>> to Avro and it became 38G >>>> Why Avro is so big in size? Am I missing some size optimization? >>>> >>>> Thanks in advance!
-
Re: Avro file size is too big
Harsh J 2012-07-20, 02:07
Snappy is known to have lower compression rates against Gzip, but perhaps you can try larger blocks in the Avro DataFiles as indicated in the thread, via a higher sync-interval? [1] What snappy is really good at is a fast decompression rate though, so perhaps your reads are going to be comparable with gzip plaintext? P.s. What do you get if you use deflate compression on the data files, with maximal compression level (9)? [2] [1] - http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/mapred/AvroOutputFormat.html#setSyncInterval(org.apache.hadoop.mapred.JobConf,%20int)or http://avro.apache.org/docs/1.7.1/api/java/index.html?org/apache/avro/mapred/AvroOutputFormat.html[2] - http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/mapred/AvroOutputFormat.html#setDeflateLevel(org.apache.hadoop.mapred.JobConf,%20int)or via http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/file/CodecFactory.html#deflateCodec(int)coupled with http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/file/DataFileWriter.html#setCodec(org.apache.avro.file.CodecFactory)On Thu, Jul 19, 2012 at 5:29 AM, Ey-Chih chow <[EMAIL PROTECTED]> wrote: > We are converting our compression scheme from gzip to snappy for our json logs. In one case, the size of a gzip file is 715MB and the corresponding snappy file is 1.885GB. The schema of the snappy file is "bytes". In other words, we compress line by line of our json logs and each line is a json string. Is there any way we can optimize our compression with snappy? > > Ey-Chih Chow > > > On Jul 5, 2012, at 3:19 PM, Doug Cutting wrote: > >> You can use the Avro command-line tool to dump the metadata, which >> will show the schema and codec: >> >> java -jar avro-tools.jar getmeta <file> >> >> Doug >> >> On Thu, Jul 5, 2012 at 3:11 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: >>> Hey Doug, >>> >>> Here is a little more of explanation >>> http://mail-archives.apache.org/mod_mbox/avro-user/201207.mbox/%3CCACBYqwQWPaj8NaGVTOir4dO%2BOqri-UM-8RQ-5Uu2r2bLCyuBTA%40mail.gmail.com%3E>>> I'll answer your questions later after some investigation >>> >>> Thank you! >>> >>> >>> On Thu, Jul 5, 2012 at 9:24 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >>>> Rusian, >>>> >>>> This is unexpected. Perhaps we can understand it if we have more information. >>>> >>>> What Writable class are you using for keys and values in the SequenceFile? >>>> >>>> What schema are you using in the Avro data file? >>>> >>>> Can you provide small sample files of each and/or code that will reproduce this? >>>> >>>> Thanks, >>>> >>>> Doug >>>> >>>> On Wed, Jul 4, 2012 at 6:32 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: >>>>> Hello, >>>>> >>>>> In my organization currently we are evaluating Avro as a format. Our >>>>> concern is file size. I've done some comparisons of a piece of our >>>>> data. >>>>> Say we have sequence files, compressed. The payload (values) are just >>>>> lines. As far as I know we use line number as keys and we use the >>>>> default codec for compression inside sequence files. The size is 1.6G, >>>>> when I put it to avro with deflate codec with deflate level 9 it >>>>> becomes 2.2G. >>>>> This is interesting, because the values in seq files are just string, >>>>> but Avro has a normal schema with primitive types. And those are kept >>>>> binary. Shouldn't Avro be less in size? >>>>> Also I took another dataset which is 28G (gzip files, plain >>>>> tab-delimited text, don't know what is the deflate level) and put it >>>>> to Avro and it became 38G >>>>> Why Avro is so big in size? Am I missing some size optimization? >>>>> >>>>> Thanks in advance! > -- Harsh J
-
Re: Avro file size is too big
Ey-Chih chow 2012-07-20, 17:02
I changed to use the maximal compression level, i.e. 9, but the size is the same. Ey-Chih Chow On Jul 19, 2012, at 7:07 PM, Harsh J wrote: > Snappy is known to have lower compression rates against Gzip, but > perhaps you can try larger blocks in the Avro DataFiles as indicated > in the thread, via a higher sync-interval? [1] What snappy is really > good at is a fast decompression rate though, so perhaps your reads are > going to be comparable with gzip plaintext? > > P.s. What do you get if you use deflate compression on the data files, > with maximal compression level (9)? [2] > > [1] - http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/mapred/AvroOutputFormat.html#setSyncInterval(org.apache.hadoop.mapred.JobConf,%20int)> or http://avro.apache.org/docs/1.7.1/api/java/index.html?org/apache/avro/mapred/AvroOutputFormat.html> > [2] - http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/mapred/AvroOutputFormat.html#setDeflateLevel(org.apache.hadoop.mapred.JobConf,%20int)> or via http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/file/CodecFactory.html#deflateCodec(int)> coupled with http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/file/DataFileWriter.html#setCodec(org.apache.avro.file.CodecFactory)> > On Thu, Jul 19, 2012 at 5:29 AM, Ey-Chih chow <[EMAIL PROTECTED]> wrote: >> We are converting our compression scheme from gzip to snappy for our json logs. In one case, the size of a gzip file is 715MB and the corresponding snappy file is 1.885GB. The schema of the snappy file is "bytes". In other words, we compress line by line of our json logs and each line is a json string. Is there any way we can optimize our compression with snappy? >> >> Ey-Chih Chow >> >> >> On Jul 5, 2012, at 3:19 PM, Doug Cutting wrote: >> >>> You can use the Avro command-line tool to dump the metadata, which >>> will show the schema and codec: >>> >>> java -jar avro-tools.jar getmeta <file> >>> >>> Doug >>> >>> On Thu, Jul 5, 2012 at 3:11 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: >>>> Hey Doug, >>>> >>>> Here is a little more of explanation >>>> http://mail-archives.apache.org/mod_mbox/avro-user/201207.mbox/%3CCACBYqwQWPaj8NaGVTOir4dO%2BOqri-UM-8RQ-5Uu2r2bLCyuBTA%40mail.gmail.com%3E>>>> I'll answer your questions later after some investigation >>>> >>>> Thank you! >>>> >>>> >>>> On Thu, Jul 5, 2012 at 9:24 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >>>>> Rusian, >>>>> >>>>> This is unexpected. Perhaps we can understand it if we have more information. >>>>> >>>>> What Writable class are you using for keys and values in the SequenceFile? >>>>> >>>>> What schema are you using in the Avro data file? >>>>> >>>>> Can you provide small sample files of each and/or code that will reproduce this? >>>>> >>>>> Thanks, >>>>> >>>>> Doug >>>>> >>>>> On Wed, Jul 4, 2012 at 6:32 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: >>>>>> Hello, >>>>>> >>>>>> In my organization currently we are evaluating Avro as a format. Our >>>>>> concern is file size. I've done some comparisons of a piece of our >>>>>> data. >>>>>> Say we have sequence files, compressed. The payload (values) are just >>>>>> lines. As far as I know we use line number as keys and we use the >>>>>> default codec for compression inside sequence files. The size is 1.6G, >>>>>> when I put it to avro with deflate codec with deflate level 9 it >>>>>> becomes 2.2G. >>>>>> This is interesting, because the values in seq files are just string, >>>>>> but Avro has a normal schema with primitive types. And those are kept >>>>>> binary. Shouldn't Avro be less in size? >>>>>> Also I took another dataset which is 28G (gzip files, plain >>>>>> tab-delimited text, don't know what is the deflate level) and put it >>>>>> to Avro and it became 38G >>>>>> Why Avro is so big in size? Am I missing some size optimization? >>>>>> >>>>>> Thanks in advance! >> > > > > -- > Harsh J
-
Re: Avro file size is too big
Ey-Chih chow 2012-07-20, 17:12
We use the avro tool, fromtext, to compress a json log file. I didn't find an option that can set the sync interval. Ey-Chih Chow On Jul 20, 2012, at 10:02 AM, Ey-Chih chow wrote: > I changed to use the maximal compression level, i.e. 9, but the size is the same. > > Ey-Chih Chow > > On Jul 19, 2012, at 7:07 PM, Harsh J wrote: > >> Snappy is known to have lower compression rates against Gzip, but >> perhaps you can try larger blocks in the Avro DataFiles as indicated >> in the thread, via a higher sync-interval? [1] What snappy is really >> good at is a fast decompression rate though, so perhaps your reads are >> going to be comparable with gzip plaintext? >> >> P.s. What do you get if you use deflate compression on the data files, >> with maximal compression level (9)? [2] >> >> [1] - http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/mapred/AvroOutputFormat.html#setSyncInterval(org.apache.hadoop.mapred.JobConf,%20int)>> or http://avro.apache.org/docs/1.7.1/api/java/index.html?org/apache/avro/mapred/AvroOutputFormat.html>> >> [2] - http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/mapred/AvroOutputFormat.html#setDeflateLevel(org.apache.hadoop.mapred.JobConf,%20int)>> or via http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/file/CodecFactory.html#deflateCodec(int)>> coupled with http://avro.apache.org/docs/1.7.1/api/java/org/apache/avro/file/DataFileWriter.html#setCodec(org.apache.avro.file.CodecFactory)>> >> On Thu, Jul 19, 2012 at 5:29 AM, Ey-Chih chow <[EMAIL PROTECTED]> wrote: >>> We are converting our compression scheme from gzip to snappy for our json logs. In one case, the size of a gzip file is 715MB and the corresponding snappy file is 1.885GB. The schema of the snappy file is "bytes". In other words, we compress line by line of our json logs and each line is a json string. Is there any way we can optimize our compression with snappy? >>> >>> Ey-Chih Chow >>> >>> >>> On Jul 5, 2012, at 3:19 PM, Doug Cutting wrote: >>> >>>> You can use the Avro command-line tool to dump the metadata, which >>>> will show the schema and codec: >>>> >>>> java -jar avro-tools.jar getmeta <file> >>>> >>>> Doug >>>> >>>> On Thu, Jul 5, 2012 at 3:11 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: >>>>> Hey Doug, >>>>> >>>>> Here is a little more of explanation >>>>> http://mail-archives.apache.org/mod_mbox/avro-user/201207.mbox/%3CCACBYqwQWPaj8NaGVTOir4dO%2BOqri-UM-8RQ-5Uu2r2bLCyuBTA%40mail.gmail.com%3E>>>>> I'll answer your questions later after some investigation >>>>> >>>>> Thank you! >>>>> >>>>> >>>>> On Thu, Jul 5, 2012 at 9:24 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >>>>>> Rusian, >>>>>> >>>>>> This is unexpected. Perhaps we can understand it if we have more information. >>>>>> >>>>>> What Writable class are you using for keys and values in the SequenceFile? >>>>>> >>>>>> What schema are you using in the Avro data file? >>>>>> >>>>>> Can you provide small sample files of each and/or code that will reproduce this? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Doug >>>>>> >>>>>> On Wed, Jul 4, 2012 at 6:32 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote: >>>>>>> Hello, >>>>>>> >>>>>>> In my organization currently we are evaluating Avro as a format. Our >>>>>>> concern is file size. I've done some comparisons of a piece of our >>>>>>> data. >>>>>>> Say we have sequence files, compressed. The payload (values) are just >>>>>>> lines. As far as I know we use line number as keys and we use the >>>>>>> default codec for compression inside sequence files. The size is 1.6G, >>>>>>> when I put it to avro with deflate codec with deflate level 9 it >>>>>>> becomes 2.2G. >>>>>>> This is interesting, because the values in seq files are just string, >>>>>>> but Avro has a normal schema with primitive types. And those are kept >>>>>>> binary. Shouldn't Avro be less in size? >>>>>>> Also I took another dataset which is 28G (gzip files, plain >>>>>>> tab-delimited text, don't know what is the deflate level) and put it
-
Re: Avro file size is too big
Doug Cutting 2012-07-20, 20:00
On Fri, Jul 20, 2012 at 10:12 AM, Ey-Chih chow <[EMAIL PROTECTED]> wrote: > We use the avro tool, fromtext, to compress a json log file. I didn't find an option that can set the sync interval.
The fromtext tool has a --level option, e.g.:
java -jar avro-tools.jar --level 9 foo.txt foo.avro
Doug
-
Re: Avro file size is too big
Ey-Chih chow 2012-07-20, 20:32
Thanks Doug.
As I mentioned before, we actually tried the level option and set it to 9. But it does not make any difference on the size of the compressed file by setting or not setting the option.
Ey-Chih Chow On Jul 20, 2012, at 1:00 PM, Doug Cutting wrote:
> On Fri, Jul 20, 2012 at 10:12 AM, Ey-Chih chow <[EMAIL PROTECTED]> wrote: >> We use the avro tool, fromtext, to compress a json log file. I didn't find an option that can set the sync interval. > > The fromtext tool has a --level option, e.g.: > > java -jar avro-tools.jar --level 9 foo.txt foo.avro > > Doug
|
|