Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Question on Hadoop Streaming


+
Romeo Kienzler 2011-12-06, 08:59
Copy link to this message
-
Re: Question on Hadoop Streaming
Does you job end with an error?

I am guessing what you want is:

-mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh'

First option says use your script as a mapper and second says ship
your script as part of the job.

Brock

On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've got the following setup for NGS read alignment:
>
>
> A script accepting data from stdin/out:
> ------------------------------------------------------------
> cat /root/bowtiestreaming.sh
> cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
> /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 -
> 2> /root/bowtie.log
>
>
>
> A file copied to HDFS:
> ------------------------------------------------------------
> hadoop fs -put
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>
> A streaming job invoked with only the mapper:
> ------------------------------------------------------------
> hadoop jar
> hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> -output
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
> -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0
>
> The file cannot be found even it is displayed:
> ------------------------------------------------------------
> hadoop fs -cat
> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
> 11/12/06 09:07:47 INFO security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=300000
> 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated.
> Instead, use mapreduce.task.attempt.id
> cat: File does not exist:
> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>
>
> He file looks like this (tab seperated):
> head
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> @SRR014475.1 :1:1:108:111 length=36     GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA
>    I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
> @SRR014475.2 :1:1:112:26 length=36      GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC
>    I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
> @SRR014475.3 :1:1:101:937 length=36     GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>    IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
> @SRR014475.4 :1:1:124:64 length=36      GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>    IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
> @SRR014475.5 :1:1:108:897 length=36     GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>    I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
> @SRR014475.6 :1:1:106:14 length=36      GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT
>    I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
> @SRR014475.7 :1:1:118:934 length=36     GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT
>    III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
> @SRR014475.8 :1:1:123:8 length=36       GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN
>    I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
> @SRR014475.9 :1:1:118:88 length=36      GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC
>    IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
> @SRR014475.10 :1:1:92:122 length=36     ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA
>    IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;
>
>
> and the result like this:
>
> cat
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> |./bowtiestreaming.sh |head
> @SRR014475.3 :1:1:101:937 length=36     +
> gi|110640213|ref|NC_008253.1|   3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>    IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G  0       7:T>C,27:G>T
> @SRR014475.4 :1:1:124:64 length=36      +
> gi|110640213|ref|NC_008253.1|   2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>    IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;  0       30:T>C
> @SRR014475.5 :1:1:108:897 length=36     +
> gi|110640213|ref|NC_008253.1|   4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
+
Romeo Kienzler 2011-12-06, 10:26
+
Romeo Kienzler 2011-12-07, 06:50
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB