|
|
+
Jagadish Bihani 2012-10-26, 11:00
+
Jagadish Bihani 2012-10-26, 13:02
+
Brock Noland 2012-10-30, 15:45
+
Jagadish Bihani 2012-10-30, 17:01
-
Re: Flume bz2 issue while processing by a map reduce jobJagadish Bihani 2012-11-02, 07:50
Hi
Any inputs on this? It looks like a basic thing which, I guess, must have been handled in flume On 10/30/2012 10:31 PM, Jagadish Bihani wrote: > Text. > > Few updates on that: > -- It looks like some header issue. > -- When I copyToLocal the file and then again copy it back to HDFS, > map reduce job processes the the file correctly then. > Is it something related to > https://issues.apache.org/jira/browse/HADOOP-6852? > > Regards, > Jagadish > > > On 10/30/2012 09:15 PM, Brock Noland wrote: >> What kind of files is your sink writing out? Text, Sequence, etc? >> >> On Fri, Oct 26, 2012 at 8:02 AM, Jagadish Bihani >> <[EMAIL PROTECTED]> wrote: >>> Same thing happens even for gzip. >>> >>> Regards, >>> Jagadish >>> >>> >>> On 10/26/2012 04:30 PM, Jagadish Bihani wrote: >>>> Hi >>>> >>>> I have a very peculiar scenario. >>>> >>>> 1. My HDFS sink creates a bz2 file. File is perfectly fine I can >>>> decompress it and >>>> read it. It has 0.2 million records. >>>> 2. Now I give that file to map-reduce job (hadoop 1.0.3) and surprisingly >>>> it only >>>> reads first 100 records. >>>> 3. I then decompress the same file on local file system and use bzip2 >>>> command of >>>> linux to again compress it and copy to HDFS. >>>> 4. Now I run the map -reduce job and this time it correctly processes all >>>> the records. >>>> >>>> I think flume agent writes compressed data to HDFS file in batches. And >>>> somehow >>>> bzip2 codec used by hadoop uses only first part of it. >>>> >>>> This way bz2 files generated by Flume, if used directly, can't be >>>> processed by Map reduce job. >>>> Is there any solution to it? >>>> >>>> Any inputs about other compression formats? >>>> >>>> P.S. >>>> Versions: >>>> >>>> Flume 1.2.0 (Raw version; downloaded from >>>> http://www.apache.org/dyn/closer.cgi/flume/1.2.0/apache-flume-1.2.0-bin.tar.gz) >>>> Hadoop 1.0.3 >>>> >>>> Regards, >>>> Jagadish >> > +
Mike Percy 2012-11-02, 09:46
+
Jagadish Bihani 2012-11-03, 11:32
|