|
|
+
Jagadish Bihani 2012-10-26, 11:00
+
Jagadish Bihani 2012-10-26, 13:02
+
Brock Noland 2012-10-30, 15:45
-
Re: Flume bz2 issue while processing by a map reduce jobJagadish Bihani 2012-10-30, 17:01
Text.
Few updates on that: -- It looks like some header issue. -- When I copyToLocal the file and then again copy it back to HDFS, map reduce job processes the the file correctly then. Is it something related to https://issues.apache.org/jira/browse/HADOOP-6852? Regards, Jagadish On 10/30/2012 09:15 PM, Brock Noland wrote: > What kind of files is your sink writing out? Text, Sequence, etc? > > On Fri, Oct 26, 2012 at 8:02 AM, Jagadish Bihani > <[EMAIL PROTECTED]> wrote: >> Same thing happens even for gzip. >> >> Regards, >> Jagadish >> >> >> On 10/26/2012 04:30 PM, Jagadish Bihani wrote: >>> Hi >>> >>> I have a very peculiar scenario. >>> >>> 1. My HDFS sink creates a bz2 file. File is perfectly fine I can >>> decompress it and >>> read it. It has 0.2 million records. >>> 2. Now I give that file to map-reduce job (hadoop 1.0.3) and surprisingly >>> it only >>> reads first 100 records. >>> 3. I then decompress the same file on local file system and use bzip2 >>> command of >>> linux to again compress it and copy to HDFS. >>> 4. Now I run the map -reduce job and this time it correctly processes all >>> the records. >>> >>> I think flume agent writes compressed data to HDFS file in batches. And >>> somehow >>> bzip2 codec used by hadoop uses only first part of it. >>> >>> This way bz2 files generated by Flume, if used directly, can't be >>> processed by Map reduce job. >>> Is there any solution to it? >>> >>> Any inputs about other compression formats? >>> >>> P.S. >>> Versions: >>> >>> Flume 1.2.0 (Raw version; downloaded from >>> http://www.apache.org/dyn/closer.cgi/flume/1.2.0/apache-flume-1.2.0-bin.tar.gz) >>> Hadoop 1.0.3 >>> >>> Regards, >>> Jagadish >> > > +
Jagadish Bihani 2012-11-02, 07:50
+
Mike Percy 2012-11-02, 09:46
+
Jagadish Bihani 2012-11-03, 11:32
|