|
|
-
Question on Hadoop Streaming
Romeo Kienzler 2011-12-06, 08:59
Hi,
I've got the following setup for NGS read alignment: A script accepting data from stdin/out: ------------------------------------------------------------ cat /root/bowtiestreaming.sh cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/ /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 - 2> /root/bowtie.log
A file copied to HDFS: ------------------------------------------------------------ hadoop fs -put SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
A streaming job invoked with only the mapper: ------------------------------------------------------------ hadoop jar hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 -output SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0
The file cannot be found even it is displayed: ------------------------------------------------------------ hadoop fs -cat /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned 11/12/06 09:07:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id cat: File does not exist: /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned He file looks like this (tab seperated): head SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
@SRR014475.1 :1:1:108:111 length=36 GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+ @SRR014475.2 :1:1:112:26 length=36 GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC I!!!!!!II=I<IIII@II5II)/$;%+*/&%%# @SRR014475.3 :1:1:101:937 length=36 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G @SRR014475.4 :1:1:124:64 length=36 GAACACATAGAACAACAGGATTCGCCAGAACACCTG IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+; @SRR014475.5 :1:1:108:897 length=36 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT I0I:I'IIII+IG3II46II0>C@=III()+:+2&$ @SRR014475.6 :1:1:106:14 length=36 GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%? @SRR014475.7 :1:1:118:934 length=36 GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&' @SRR014475.8 :1:1:123:8 length=36 GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!! @SRR014475.9 :1:1:118:88 length=36 GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$% @SRR014475.10 :1:1:92:122 length=36 ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#; and the result like this:
cat SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 |./bowtiestreaming.sh |head @SRR014475.3 :1:1:101:937 length=36 + gi|110640213|ref|NC_008253.1| 3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G 0 7:T>C,27:G>T @SRR014475.4 :1:1:124:64 length=36 + gi|110640213|ref|NC_008253.1| 2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+; 0 30:T>C @SRR014475.5 :1:1:108:897 length=36 + gi|110640213|ref|NC_008253.1| 4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT I0I:I'IIII+IG3II46II0>C@=III()+:+2&$ 0 5:C>A,28:G>T,29:C>G,30:A>T,34:C>T @SRR014475.9 :1:1:118:88 length=36 - gi|110640213|ref|NC_008253.1| 3598410 GTGGCGCGTTACCTGGTAGCGCGCCATTTTGTTTCC %$#%$')+*;*61+1;4AAIGIIIIIIIIIIIIIII 0 @SRR014475.15 :1:1:87:967 length=36 + gi|110640213|ref|NC_008253.1| 4474247 GACTACACGATCGCCTGCCTTAATATTCTTTACACC IIIIIIIIIIIIA27II7CIII*I5I+FIIII?II' 0 6:G>A,26:G>T @SRR014475.20 :1:1:108:121 length=36 - gi|110640213|ref|NC_008253.1| 37761 AAAAAATGCATATTGTTTTAGAGTGTGATTATTAGC I<D4II'2I<IIC/;B?FIIIIIIIIIIIIIIIIII 0 12:C>T @SRR014475.23 :1:1:75:54 length=36 + gi|110640213|ref|NC_008253.1| 2465453 GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA IIIIIIIIIIIICII<III;';29=9I.4%EE2)*' 0 @SRR014475.24 :1:1:89:904 length=36 - gi|110640213|ref|NC_008253.1| 3216193 ATTAGTGTTAAGATTTCTATATTGTTGTTTTAGGCC #%);%;$EI-;$%8%&I%I/+IIIIIIIIIIIIIII 0 18:C>T,21:G>T,30:C>T,31:T>G,34:A>T @SRR014475.27 :1:1:74:887 length=36 - gi|110640213|ref|NC_008253.1| 540567 AAACGTGGCGTTTCAGGGATCGTTTGCCTGCATTAC *&(%9%0F3.@4;&?4I3I6%:9AI0HIIIIIIIII 0 34:C>A,35:C>A @SRR014475.30 :1:1:123:73 length=36 + gi|110640213|ref|NC_008253.1| 3391697 AAAAGATTGCGACTGACGGCGCAAATGCCCTCCGTT IIIIIIIIICI:II3*<4.*'+%'&)&$;+;%;%;; 0 30:C>T,34:G>T Any ideas?
best Regards,
Romeo Romeo Kienzler r o m e o @ o r m i u m . d e
-
Re: Question on Hadoop Streaming
Brock Noland 2011-12-06, 09:49
Does you job end with an error?
I am guessing what you want is:
-mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh'
First option says use your script as a mapper and second says ship your script as part of the job.
Brock
On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler <[EMAIL PROTECTED]> wrote: > Hi, > > I've got the following setup for NGS read alignment: > > > A script accepting data from stdin/out: > ------------------------------------------------------------ > cat /root/bowtiestreaming.sh > cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/ > /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 - > 2> /root/bowtie.log > > > > A file copied to HDFS: > ------------------------------------------------------------ > hadoop fs -put > SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 > SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 > > A streaming job invoked with only the mapper: > ------------------------------------------------------------ > hadoop jar > hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input > SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 > -output > SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned > -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0 > > The file cannot be found even it is displayed: > ------------------------------------------------------------ > hadoop fs -cat > /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned > 11/12/06 09:07:47 INFO security.Groups: Group mapping > impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; > cacheTimeout=300000 > 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated. > Instead, use mapreduce.task.attempt.id > cat: File does not exist: > /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned > > > He file looks like this (tab seperated): > head > SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 > @SRR014475.1 :1:1:108:111 length=36 GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA > I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+ > @SRR014475.2 :1:1:112:26 length=36 GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC > I!!!!!!II=I<IIII@II5II)/$;%+*/&%%# > @SRR014475.3 :1:1:101:937 length=36 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA > IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G > @SRR014475.4 :1:1:124:64 length=36 GAACACATAGAACAACAGGATTCGCCAGAACACCTG > IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+; > @SRR014475.5 :1:1:108:897 length=36 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT > I0I:I'IIII+IG3II46II0>C@=III()+:+2&$ > @SRR014475.6 :1:1:106:14 length=36 GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT > I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%? > @SRR014475.7 :1:1:118:934 length=36 GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT > III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&' > @SRR014475.8 :1:1:123:8 length=36 GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN > I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!! > @SRR014475.9 :1:1:118:88 length=36 GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC > IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$% > @SRR014475.10 :1:1:92:122 length=36 ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA > IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#; > > > and the result like this: > > cat > SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 > |./bowtiestreaming.sh |head > @SRR014475.3 :1:1:101:937 length=36 + > gi|110640213|ref|NC_008253.1| 3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA > IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G 0 7:T>C,27:G>T > @SRR014475.4 :1:1:124:64 length=36 + > gi|110640213|ref|NC_008253.1| 2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG > IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+; 0 30:T>C > @SRR014475.5 :1:1:108:897 length=36 + > gi|110640213|ref|NC_008253.1| 4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
-
Re: Question on Hadoop Streaming
Romeo Kienzler 2011-12-06, 10:26
Hi Brock,
I'm not getting any errors.
I'm issuing the following command now:
hadoop jar hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 -output SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0 -file bowtiestreaming.sh
The only error I get using "cat hadoop-0.21.0/logs/* |grep Exception" is: org.apache.hadoop.fs.ChecksumException: Checksum error: file:/root/hadoop-0.21.0/logs/history/job_201112060917_0002_root at 2620416 2011-12-06 11:14:34,515 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -13816: No such process 2011-12-06 11:14:43,039 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -13862: No such process 2011-12-06 11:14:46,282 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -13891: No such process 2011-12-06 11:14:49,841 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -13978: No such process best Regards,
Romeo
On 12/06/2011 10:49 AM, Brock Noland wrote: > Does you job end with an error? > > I am guessing what you want is: > > -mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh' > > First option says use your script as a mapper and second says ship > your script as part of the job. > > Brock > > On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler<[EMAIL PROTECTED]> wrote: >> Hi, >> >> I've got the following setup for NGS read alignment: >> >> >> A script accepting data from stdin/out: >> ------------------------------------------------------------ >> cat /root/bowtiestreaming.sh >> cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/ >> /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 - >> 2> /root/bowtie.log >> >> >> >> A file copied to HDFS: >> ------------------------------------------------------------ >> hadoop fs -put >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 >> >> A streaming job invoked with only the mapper: >> ------------------------------------------------------------ >> hadoop jar >> hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 >> -output >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned >> -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0 >> >> The file cannot be found even it is displayed: >> ------------------------------------------------------------ >> hadoop fs -cat >> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned >> 11/12/06 09:07:47 INFO security.Groups: Group mapping >> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; >> cacheTimeout=300000 >> 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated. >> Instead, use mapreduce.task.attempt.id >> cat: File does not exist: >> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned >> >> >> He file looks like this (tab seperated): >> head >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 >> @SRR014475.1 :1:1:108:111 length=36 GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA >> I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+ >> @SRR014475.2 :1:1:112:26 length=36 GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC >> I!!!!!!II=I<IIII@II5II)/$;%+*/&%%# >> @SRR014475.3 :1:1:101:937 length=36 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
-
Re: Question on Hadoop Streaming
Romeo Kienzler 2011-12-07, 06:50
Hi,
the following command works:
hadoop jar hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input input -output output2 -mapper /root/bowtiestreaming.sh -reducer NONE
Best Regards,
Romeo
On 12/06/2011 10:49 AM, Brock Noland wrote: > Does you job end with an error? > > I am guessing what you want is: > > -mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh' > > First option says use your script as a mapper and second says ship > your script as part of the job. > > Brock > > On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler<[EMAIL PROTECTED]> wrote: >> Hi, >> >> I've got the following setup for NGS read alignment: >> >> >> A script accepting data from stdin/out: >> ------------------------------------------------------------ >> cat /root/bowtiestreaming.sh >> cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/ >> /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 - >> 2> /root/bowtie.log >> >> >> >> A file copied to HDFS: >> ------------------------------------------------------------ >> hadoop fs -put >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 >> >> A streaming job invoked with only the mapper: >> ------------------------------------------------------------ >> hadoop jar >> hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 >> -output >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned >> -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0 >> >> The file cannot be found even it is displayed: >> ------------------------------------------------------------ >> hadoop fs -cat >> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned >> 11/12/06 09:07:47 INFO security.Groups: Group mapping >> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; >> cacheTimeout=300000 >> 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated. >> Instead, use mapreduce.task.attempt.id >> cat: File does not exist: >> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned >> >> >> He file looks like this (tab seperated): >> head >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 >> @SRR014475.1 :1:1:108:111 length=36 GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA >> I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+ >> @SRR014475.2 :1:1:112:26 length=36 GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC >> I!!!!!!II=I<IIII@II5II)/$;%+*/&%%# >> @SRR014475.3 :1:1:101:937 length=36 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA >> IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G >> @SRR014475.4 :1:1:124:64 length=36 GAACACATAGAACAACAGGATTCGCCAGAACACCTG >> IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+; >> @SRR014475.5 :1:1:108:897 length=36 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT >> I0I:I'IIII+IG3II46II0>C@=III()+:+2&$ >> @SRR014475.6 :1:1:106:14 length=36 GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT >> I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%? >> @SRR014475.7 :1:1:118:934 length=36 GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT >> III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&' >> @SRR014475.8 :1:1:123:8 length=36 GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN >> I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!! >> @SRR014475.9 :1:1:118:88 length=36 GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC >> IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$% >> @SRR014475.10 :1:1:92:122 length=36 ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA >> IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#; >> >> >> and the result like this: >> >> cat >> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 >> |./bowtiestreaming.sh |head >> @SRR014475.3 :1:1:101:937 length=36 + >> gi|110640213|ref|NC_008253.1| 3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
|
|