Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Question on Hadoop Streaming


Copy link to this message
-
Re: Question on Hadoop Streaming
Brock Noland 2011-12-06, 09:49
Does you job end with an error?

I am guessing what you want is:

-mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh'

First option says use your script as a mapper and second says ship
your script as part of the job.

Brock

On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've got the following setup for NGS read alignment:
>
>
> A script accepting data from stdin/out:
> ------------------------------------------------------------
> cat /root/bowtiestreaming.sh
> cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
> /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 -
> 2> /root/bowtie.log
>
>
>
> A file copied to HDFS:
> ------------------------------------------------------------
> hadoop fs -put
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>
> A streaming job invoked with only the mapper:
> ------------------------------------------------------------
> hadoop jar
> hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> -output
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
> -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0
>
> The file cannot be found even it is displayed:
> ------------------------------------------------------------
> hadoop fs -cat
> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
> 11/12/06 09:07:47 INFO security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=300000
> 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated.
> Instead, use mapreduce.task.attempt.id
> cat: File does not exist:
> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>
>
> He file looks like this (tab seperated):
> head
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> @SRR014475.1 :1:1:108:111 length=36     GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA
>    I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
> @SRR014475.2 :1:1:112:26 length=36      GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC
>    I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
> @SRR014475.3 :1:1:101:937 length=36     GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>    IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
> @SRR014475.4 :1:1:124:64 length=36      GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>    IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
> @SRR014475.5 :1:1:108:897 length=36     GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>    I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
> @SRR014475.6 :1:1:106:14 length=36      GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT
>    I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
> @SRR014475.7 :1:1:118:934 length=36     GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT
>    III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
> @SRR014475.8 :1:1:123:8 length=36       GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN
>    I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
> @SRR014475.9 :1:1:118:88 length=36      GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC
>    IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
> @SRR014475.10 :1:1:92:122 length=36     ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA
>    IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;
>
>
> and the result like this:
>
> cat
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> |./bowtiestreaming.sh |head
> @SRR014475.3 :1:1:101:937 length=36     +
> gi|110640213|ref|NC_008253.1|   3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>    IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G  0       7:T>C,27:G>T
> @SRR014475.4 :1:1:124:64 length=36      +
> gi|110640213|ref|NC_008253.1|   2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>    IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;  0       30:T>C
> @SRR014475.5 :1:1:108:897 length=36     +
> gi|110640213|ref|NC_008253.1|   4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT