-Re: hdfs.fileType = CompressedStream
Jeff Lord 2014-01-30, 21:59
You are using gzip so the files won't splittable.
You may be better off using snappy and sequence files.
On Thu, Jan 30, 2014 at 10:51 AM, Jimmy <[EMAIL PROTECTED]> wrote:
> I am running few tests and would like to confirm whether this is true...
> hdfs.codeC = gzip
> hdfs.fileType = CompressedStream
> hdfs.writeFormat = Text
> hdfs.batchSize = 100
> now lets assume I have large number of transactions I roll file every 10
> it seems the tmp file stay 0bytes and flushes at once after 10 minutes vs
> if I dont use compression, the file will grow as data are written to HDFS
> is this correct?
> Do you see any drawback in using compressedstream and with very large
> files? In my case 120MB compressed file (block size) is 10x uncompressed