Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Flume bz2 issue while processing by a map reduce job


+
Jagadish Bihani 2012-10-26, 11:00
+
Jagadish Bihani 2012-10-26, 13:02
+
Brock Noland 2012-10-30, 15:45
+
Jagadish Bihani 2012-10-30, 17:01
Copy link to this message
-
Re: Flume bz2 issue while processing by a map reduce job
Hi

Any inputs on this?
It looks like a basic thing which, I guess, must have been handled in flume
On 10/30/2012 10:31 PM, Jagadish Bihani wrote:
> Text.
>
> Few updates on that:
> -- It looks like some header issue.
> -- When I copyToLocal the file and then again copy it back to HDFS,
> map reduce job processes the the file correctly then.
> Is it something related to
> https://issues.apache.org/jira/browse/HADOOP-6852?
>
> Regards,
> Jagadish
>
>
> On 10/30/2012 09:15 PM, Brock Noland wrote:
>> What kind of files is your sink writing out? Text, Sequence, etc?
>>
>> On Fri, Oct 26, 2012 at 8:02 AM, Jagadish Bihani
>> <[EMAIL PROTECTED]>  wrote:
>>> Same thing happens even for gzip.
>>>
>>> Regards,
>>> Jagadish
>>>
>>>
>>> On 10/26/2012 04:30 PM, Jagadish Bihani wrote:
>>>> Hi
>>>>
>>>> I have a very peculiar scenario.
>>>>
>>>>   1. My HDFS sink creates a bz2 file. File is perfectly fine I can
>>>> decompress it and
>>>> read it. It has 0.2 million records.
>>>> 2. Now I give that file to map-reduce job (hadoop 1.0.3) and surprisingly
>>>> it only
>>>> reads first 100 records.
>>>> 3. I then decompress the same file on local file system and use bzip2
>>>> command of
>>>> linux to again compress it and copy to HDFS.
>>>> 4. Now I run the map -reduce job and this time it correctly processes all
>>>> the records.
>>>>
>>>> I think flume agent writes compressed data to HDFS file in batches. And
>>>> somehow
>>>> bzip2 codec used by hadoop uses only first part of it.
>>>>
>>>> This way bz2 files generated by Flume, if used directly, can't be
>>>> processed by Map reduce job.
>>>> Is there any solution to it?
>>>>
>>>> Any inputs about other compression formats?
>>>>
>>>> P.S.
>>>> Versions:
>>>>
>>>> Flume 1.2.0 (Raw version; downloaded from
>>>> http://www.apache.org/dyn/closer.cgi/flume/1.2.0/apache-flume-1.2.0-bin.tar.gz)
>>>> Hadoop 1.0.3
>>>>
>>>> Regards,
>>>> Jagadish
>>
>

+
Mike Percy 2012-11-02, 09:46
+
Jagadish Bihani 2012-11-03, 11:32