Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> hdfs.fileType = CompressedStream


Copy link to this message
-
Re: hdfs.fileType = CompressedStream
You are using gzip so the files won't splittable.
You may be better off using snappy and sequence files.
On Thu, Jan 30, 2014 at 10:51 AM, Jimmy <[EMAIL PROTECTED]> wrote:

> I am running few tests and would like to confirm whether this is true...
>
> hdfs.codeC = gzip
> hdfs.fileType = CompressedStream
> hdfs.writeFormat = Text
> hdfs.batchSize = 100
>
>
> now lets assume I have large number of transactions I roll file every 10
> minutes
>
> it seems the tmp file stay 0bytes and flushes at once after 10 minutes vs
> if I dont use compression, the file will grow as data are written to HDFS
>
> is this correct?
>
> Do you see any drawback in using compressedstream and with very large
> files? In my case 120MB compressed file (block size) is 10x uncompressed
>
>