Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Input split for a streaming job!


Copy link to this message
-
Re: Input split for a streaming job!
bejoy.hadoop@... 2011-11-11, 18:44
Hi Raj
       AFAIK 0.21is an unstable release and I fear anyone would recommend that for production. You can play around with the same, a better approach would be patching your CDH3u1 with the required patches for splittable BZip2, but make sure that your new patch doesn't break anything else.
 
Regards
Bejoy K S

-----Original Message-----
From: Raj V <[EMAIL PROTECTED]>
Date: Fri, 11 Nov 2011 10:34:18
To: Tim Broberg<[EMAIL PROTECTED]>; [EMAIL PROTECTED]<[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: Input split for a streaming job!

Tim

I  am using CDH3 U1. ( 0.20.2+923) which does not have the patch.

I will try and use 0.21

Raj

>________________________________
>From: Tim Broberg <[EMAIL PROTECTED]>
>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Raj V <[EMAIL PROTECTED]>; Joey Echeverria <[EMAIL PROTECTED]>
>Sent: Friday, November 11, 2011 10:25 AM
>Subject: RE: Input split for a streaming job!
>
>
>
>What version of hadoop are you using?

>We just stumbled on the Jira item for BZIP2 splitting, and it appears to have been added in 0.21.

>When I diff 0.20.205 vs trunk, I see
>< public class BZip2Codec implements
>><     org.apache.hadoop.io.compress.CompressionCodec {
>>---
>>> @InterfaceAudience.Public
>>> @InterfaceStability.Evolving
>>> public class BZip2Codec implements SplittableCompressionCodec {
>So, it appears you need at least 0.21 to play with splittability in BZIP2.

>     - Tim.
>
>________________________________________
>From: Raj V [[EMAIL PROTECTED]]
>Sent: Friday, November 11, 2011 9:18 AM
>To: Joey Echeverria
>Cc: [EMAIL PROTECTED]
>Subject: Re: Input split for a streaming job!
>
>Joey,Anirudh, Bejoy
>
>I am using TextInputFormat Class. (org.apache.hadoop.mapred.TextInputFormat).
>
>And the input files were created using 32MB block size and the files are bzip2.
>
>So all things point to my input files being spliitable.
>
>I  will continue poking around.
>
>- best regards
>
>Raj
>
>
>
>>________________________________
>>From: Joey Echeverria <[EMAIL PROTECTED]>
>>To: Raj V <[EMAIL PROTECTED]>
>>Sent: Friday, November 11, 2011 2:56 AM
>>Subject: Re: Input split for a streaming job!
>>
>>U1 should be able to split the bzip2 files. What input format are you using?
>>
>>-Joey
>>
>>On Thu, Nov 10, 2011 at 9:06 PM, Raj V <[EMAIL PROTECTED]> wrote:
>>> Sorry to bother you offline.
>>> From the release notes for CDH3U1
>>> ( http://archive.cloudera.com/cdh/3/hadoop-0.20.2+923.97.releasenotes.html)
>>> I understand that split of the bzip files was available.
>>> But returning to my old problem I still see 73 mappers. Did I misunderstand
>>> something?
>>> If necessary, I can re-post the mail to the group.,
>>>
>>> ________________________________
>>> From: Joey Echeverria <[EMAIL PROTECTED]>
>>> To: [EMAIL PROTECTED]
>>> Sent: Thursday, November 10, 2011 3:11 PM
>>> Subject: Re: Input split for a streaming job!
>>>
>>> No problem. Out of curiosity, why are you still using B3?
>>>
>>> -Joey
>>>
>>> On Thu, Nov 10, 2011 at 6:07 PM, Raj V <[EMAIL PROTECTED]> wrote:
>>>> Joey
>>>> I think I know the answer. I am using CDH3B3 ( 0-20.2+737) and this does
>>>> not
>>>> seem to support bzip splitting. I should have looked before shooting off
>>>> the
>>>> email :-(
>>>> To answer your second question, I created a completely new set of input
>>>> files with dfs.block.size=32MB and used this as the input data
>>>> Raj
>>>>
>>>>
>>>> ________________________________
>>>> From: Joey Echeverria <[EMAIL PROTECTED]>
>>>> To: [EMAIL PROTECTED]
>>>> Sent: Thursday, November 10, 2011 3:02 PM
>>>> Subject: Re: Input split for a streaming job!
>>>>
>>>> It depends on the version of hadoop that you're using. Also, when you
>>>> changed the block size, did you do it on the actual files, or just the
>>>> default for new files?
>>>>
>>>> -Joey
>>>>
>>>> On Thu, Nov 10, 2011 at 5:52 PM, Raj V <[EMAIL PROTECTED]> wrote:
>>>>> Hi Joey,