|
s d
2009-05-19, 15:36
Alex Loddengaard
2009-05-19, 16:48
s d
2009-05-19, 17:35
Billy Pearson
2009-05-19, 19:53
Amr Awadallah
2009-05-19, 20:30
Peter Skomoroch
2009-05-19, 20:59
Peter Skomoroch
2009-05-19, 21:04
Peter Skomoroch
2009-05-19, 21:04
Alex Loddengaard
2009-05-20, 00:17
Zak Stone
2009-05-20, 00:31
s d
2009-05-20, 15:12
Dan Milstein
2009-05-21, 12:19
Todd Lipcon
2009-05-21, 17:22
|
-
Hadoop & Pythons d 2009-05-19, 15:36
Hi,
How robust is using hadoop with python over the streaming protocol? Any disadvantages (performance? flexibility?) ? It just strikes me that python is so much more convenient when it comes to deploying and crunching text files. Thanks,
-
Re: Hadoop & PythonAlex Loddengaard 2009-05-19, 16:48
Streaming is slightly slower than native Java jobs. Otherwise Python works
great in streaming. Alex On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: > Hi, > How robust is using hadoop with python over the streaming protocol? Any > disadvantages (performance? flexibility?) ? It just strikes me that python > is so much more convenient when it comes to deploying and crunching text > files. > Thanks, >
-
Re: Hadoop & Pythons d 2009-05-19, 17:35
Thanks.
So in the overall scheme of things, what is the general feeling about using python for this? I like the ease of deploying and reading python compared with Java but want to make sure using python over hadoop is scalable & is standard practice and not something done only for prototyping and small scale tests. On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > Streaming is slightly slower than native Java jobs. Otherwise Python works > great in streaming. > > Alex > > On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: > > > Hi, > > How robust is using hadoop with python over the streaming protocol? Any > > disadvantages (performance? flexibility?) ? It just strikes me that > python > > is so much more convenient when it comes to deploying and crunching text > > files. > > Thanks, > > >
-
Re: Hadoop & PythonBilly Pearson 2009-05-19, 19:53
I used streaming and php before to work with processing data with a data set
of about 1TB with out any problems at all. Billy "s d" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]... > Thanks. > So in the overall scheme of things, what is the general feeling about > using > python for this? I like the ease of deploying and reading python compared > with Java but want to make sure using python over hadoop is scalable & is > standard practice and not something done only for prototyping and small > scale tests. > > > On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard > <[EMAIL PROTECTED]> wrote: > >> Streaming is slightly slower than native Java jobs. Otherwise Python >> works >> great in streaming. >> >> Alex >> >> On Tue, May 19, 2009 at 8:36 AM, s d >> <[EMAIL PROTECTED]> wrote: >> >> > Hi, >> > How robust is using hadoop with python over the streaming protocol? Any >> > disadvantages (performance? flexibility?) ? It just strikes me that >> python >> > is so much more convenient when it comes to deploying and crunching >> > text >> > files. >> > Thanks, >> > >> >
-
Re: Hadoop & PythonAmr Awadallah 2009-05-19, 20:30
S d,
It is totally fine to use Python streaming if it does the job you are after, there will be a slight performance hit, but that is noise assuming your cluster is a small one. If you are operating a large cluster continuously, then once your logic is stabilized using Python it might make sense to convert/operationalize some jobs to Java (or C pipes) to improve performance for purpose of finishing quicker or reducing number of servers needed. You should also take a look at PIG and Hive, they are both higher level languages and very easy to learn: http://www.cloudera.com/hadoop-training-pig-introduction http://www.cloudera.com/hadoop-training-hive-introduction -- amr s d wrote: > Thanks. > So in the overall scheme of things, what is the general feeling about using > python for this? I like the ease of deploying and reading python compared > with Java but want to make sure using python over hadoop is scalable & is > standard practice and not something done only for prototyping and small > scale tests. > > > On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > > >> Streaming is slightly slower than native Java jobs. Otherwise Python works >> great in streaming. >> >> Alex >> >> On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: >> >> >>> Hi, >>> How robust is using hadoop with python over the streaming protocol? Any >>> disadvantages (performance? flexibility?) ? It just strikes me that >>> >> python >> >>> is so much more convenient when it comes to deploying and crunching text >>> files. >>> Thanks, >>> >>> > >
-
Re: Hadoop & PythonPeter Skomoroch 2009-05-19, 20:59
One area I'm curious about is the requirement that any combiners in
Streaming jobs be java classes. Are there any plans to change this in the future? Prototyping streaming jobs in Python is great, and the ability to use a Python combiner would help performance a lot without needing to move to Java. On Tue, May 19, 2009 at 4:30 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: > S d, > > It is totally fine to use Python streaming if it does the job you are > after, there will be a slight performance hit, but that is noise assuming > your cluster is a small one. If you are operating a large cluster > continuously, then once your logic is stabilized using Python it might make > sense to convert/operationalize some jobs to Java (or C pipes) to improve > performance for purpose of finishing quicker or reducing number of servers > needed. > > You should also take a look at PIG and Hive, they are both higher level > languages and very easy to learn: > > http://www.cloudera.com/hadoop-training-pig-introduction > > http://www.cloudera.com/hadoop-training-hive-introduction > > -- amr > > > s d wrote: > >> Thanks. >> So in the overall scheme of things, what is the general feeling about >> using >> python for this? I like the ease of deploying and reading python compared >> with Java but want to make sure using python over hadoop is scalable & is >> standard practice and not something done only for prototyping and small >> scale tests. >> >> >> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]> >> wrote: >> >> >> >>> Streaming is slightly slower than native Java jobs. Otherwise Python >>> works >>> great in streaming. >>> >>> Alex >>> >>> On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: >>> >>> >>> >>>> Hi, >>>> How robust is using hadoop with python over the streaming protocol? Any >>>> disadvantages (performance? flexibility?) ? It just strikes me that >>>> >>>> >>> python >>> >>> >>>> is so much more convenient when it comes to deploying and crunching text >>>> files. >>>> Thanks, >>>> >>>> >>>> >>> >> >> > -- Peter N. Skomoroch 617.285.8348 http://www.datawrangling.com http://delicious.com/pskomoroch http://twitter.com/peteskomoroch
-
Re: Hadoop & PythonPeter Skomoroch 2009-05-19, 21:04
Whoops, should have googled it first. Looks like this is now fixed in
trunk, HADOOP-4842. For people stuck using 18.3, a workaround appears to be adding something like "| sort | sh combiner.sh" to the call of the mapper script (via Klaas Bosteels) Would be great to get this patched into distributions like EMR and Cloudera On Tue, May 19, 2009 at 4:59 PM, Peter Skomoroch <[EMAIL PROTECTED]>wrote: > One area I'm curious about is the requirement that any combiners in > Streaming jobs be java classes. Are there any plans to change this in the > future? Prototyping streaming jobs in Python is great, and the ability to > use a Python combiner would help performance a lot without needing to move > to Java. > > > > > On Tue, May 19, 2009 at 4:30 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: > >> S d, >> >> It is totally fine to use Python streaming if it does the job you are >> after, there will be a slight performance hit, but that is noise assuming >> your cluster is a small one. If you are operating a large cluster >> continuously, then once your logic is stabilized using Python it might make >> sense to convert/operationalize some jobs to Java (or C pipes) to improve >> performance for purpose of finishing quicker or reducing number of servers >> needed. >> >> You should also take a look at PIG and Hive, they are both higher level >> languages and very easy to learn: >> >> http://www.cloudera.com/hadoop-training-pig-introduction >> >> http://www.cloudera.com/hadoop-training-hive-introduction >> >> -- amr >> >> >> s d wrote: >> >>> Thanks. >>> So in the overall scheme of things, what is the general feeling about >>> using >>> python for this? I like the ease of deploying and reading python compared >>> with Java but want to make sure using python over hadoop is scalable & is >>> standard practice and not something done only for prototyping and small >>> scale tests. >>> >>> >>> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]> >>> wrote: >>> >>> >>> >>>> Streaming is slightly slower than native Java jobs. Otherwise Python >>>> works >>>> great in streaming. >>>> >>>> Alex >>>> >>>> On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: >>>> >>>> >>>> >>>>> Hi, >>>>> How robust is using hadoop with python over the streaming protocol? Any >>>>> disadvantages (performance? flexibility?) ? It just strikes me that >>>>> >>>>> >>>> python >>>> >>>> >>>>> is so much more convenient when it comes to deploying and crunching >>>>> text >>>>> files. >>>>> Thanks, >>>>> >>>>> >>>>> >>>> >>> >>> >> > > > -- > Peter N. Skomoroch > 617.285.8348 > http://www.datawrangling.com > http://delicious.com/pskomoroch > http://twitter.com/peteskomoroch > -- Peter N. Skomoroch 617.285.8348 http://www.datawrangling.com http://delicious.com/pskomoroch http://twitter.com/peteskomoroch
-
Re: Hadoop & PythonPeter Skomoroch 2009-05-19, 21:04
Direct link to HADOOP-4842:
https://issues.apache.org/jira/browse/HADOOP-4842 On Tue, May 19, 2009 at 5:04 PM, Peter Skomoroch <[EMAIL PROTECTED]>wrote: > Whoops, should have googled it first. Looks like this is now fixed in > trunk, HADOOP-4842. For people stuck using 18.3, a workaround appears to be > adding something like "| sort | sh combiner.sh" to the call of the mapper > script (via Klaas Bosteels) > > Would be great to get this patched into distributions like EMR and Cloudera > > > On Tue, May 19, 2009 at 4:59 PM, Peter Skomoroch < > [EMAIL PROTECTED]> wrote: > >> One area I'm curious about is the requirement that any combiners in >> Streaming jobs be java classes. Are there any plans to change this in the >> future? Prototyping streaming jobs in Python is great, and the ability to >> use a Python combiner would help performance a lot without needing to move >> to Java. >> >> >> >> >> On Tue, May 19, 2009 at 4:30 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: >> >>> S d, >>> >>> It is totally fine to use Python streaming if it does the job you are >>> after, there will be a slight performance hit, but that is noise assuming >>> your cluster is a small one. If you are operating a large cluster >>> continuously, then once your logic is stabilized using Python it might make >>> sense to convert/operationalize some jobs to Java (or C pipes) to improve >>> performance for purpose of finishing quicker or reducing number of servers >>> needed. >>> >>> You should also take a look at PIG and Hive, they are both higher level >>> languages and very easy to learn: >>> >>> http://www.cloudera.com/hadoop-training-pig-introduction >>> >>> http://www.cloudera.com/hadoop-training-hive-introduction >>> >>> -- amr >>> >>> >>> s d wrote: >>> >>>> Thanks. >>>> So in the overall scheme of things, what is the general feeling about >>>> using >>>> python for this? I like the ease of deploying and reading python >>>> compared >>>> with Java but want to make sure using python over hadoop is scalable & >>>> is >>>> standard practice and not something done only for prototyping and small >>>> scale tests. >>>> >>>> >>>> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]> >>>> wrote: >>>> >>>> >>>> >>>>> Streaming is slightly slower than native Java jobs. Otherwise Python >>>>> works >>>>> great in streaming. >>>>> >>>>> Alex >>>>> >>>>> On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> How robust is using hadoop with python over the streaming protocol? >>>>>> Any >>>>>> disadvantages (performance? flexibility?) ? It just strikes me that >>>>>> >>>>>> >>>>> python >>>>> >>>>> >>>>>> is so much more convenient when it comes to deploying and crunching >>>>>> text >>>>>> files. >>>>>> Thanks, >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >> -- >> Peter N. Skomoroch >> 617.285.8348 >> http://www.datawrangling.com >> http://delicious.com/pskomoroch >> http://twitter.com/peteskomoroch >> > > > > -- > Peter N. Skomoroch > 617.285.8348 > http://www.datawrangling.com > http://delicious.com/pskomoroch > http://twitter.com/peteskomoroch > -- Peter N. Skomoroch 617.285.8348 http://www.datawrangling.com http://delicious.com/pskomoroch http://twitter.com/peteskomoroch
-
Re: Hadoop & PythonAlex Loddengaard 2009-05-20, 00:17
You might also check out Dumbo, which is a Hadoop Python module.
<http://www.audioscrobbler.net/development/dumbo/> Alex On Tue, May 19, 2009 at 10:35 AM, s d <[EMAIL PROTECTED]> wrote: > Thanks. > So in the overall scheme of things, what is the general feeling about using > python for this? I like the ease of deploying and reading python compared > with Java but want to make sure using python over hadoop is scalable & is > standard practice and not something done only for prototyping and small > scale tests. > > > On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]> > wrote: > > > Streaming is slightly slower than native Java jobs. Otherwise Python > works > > great in streaming. > > > > Alex > > > > On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > How robust is using hadoop with python over the streaming protocol? Any > > > disadvantages (performance? flexibility?) ? It just strikes me that > > python > > > is so much more convenient when it comes to deploying and crunching > text > > > files. > > > Thanks, > > > > > >
-
Re: Hadoop & PythonZak Stone 2009-05-20, 00:31
Dumbo certainly makes Python Streaming much nicer; there's more info here:
http://wiki.github.com/klbostee/dumbo http://dumbotics.com/ For example, Dumbo makes it easy to implement combiners in Python. Zak On Tue, May 19, 2009 at 8:17 PM, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > You might also check out Dumbo, which is a Hadoop Python module. > > <http://www.audioscrobbler.net/development/dumbo/> > > Alex > > On Tue, May 19, 2009 at 10:35 AM, s d <[EMAIL PROTECTED]> wrote: > >> Thanks. >> So in the overall scheme of things, what is the general feeling about using >> python for this? I like the ease of deploying and reading python compared >> with Java but want to make sure using python over hadoop is scalable & is >> standard practice and not something done only for prototyping and small >> scale tests. >> >> >> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]> >> wrote: >> >> > Streaming is slightly slower than native Java jobs. Otherwise Python >> works >> > great in streaming. >> > >> > Alex >> > >> > On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: >> > >> > > Hi, >> > > How robust is using hadoop with python over the streaming protocol? Any >> > > disadvantages (performance? flexibility?) ? It just strikes me that >> > python >> > > is so much more convenient when it comes to deploying and crunching >> text >> > > files. >> > > Thanks, >> > > >> > >> >
-
Re: Hadoop & Pythons d 2009-05-20, 15:12
Thanks, What would be the # of severs , file sizes that in their range the
performance hit will be minor? I am concerned about implementing it all only to rewrite it later to scale economically. Thanks for all the information. On Tue, May 19, 2009 at 1:30 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: > S d, > > It is totally fine to use Python streaming if it does the job you are > after, there will be a slight performance hit, but that is noise assuming > your cluster is a small one. If you are operating a large cluster > continuously, then once your logic is stabilized using Python it might make > sense to convert/operationalize some jobs to Java (or C pipes) to improve > performance for purpose of finishing quicker or reducing number of servers > needed. > > You should also take a look at PIG and Hive, they are both higher level > languages and very easy to learn: > > http://www.cloudera.com/hadoop-training-pig-introduction > > http://www.cloudera.com/hadoop-training-hive-introduction > > -- amr > > > s d wrote: > >> Thanks. >> So in the overall scheme of things, what is the general feeling about >> using >> python for this? I like the ease of deploying and reading python compared >> with Java but want to make sure using python over hadoop is scalable & is >> standard practice and not something done only for prototyping and small >> scale tests. >> >> >> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]> >> wrote: >> >> >> >>> Streaming is slightly slower than native Java jobs. Otherwise Python >>> works >>> great in streaming. >>> >>> Alex >>> >>> On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: >>> >>> >>> >>>> Hi, >>>> How robust is using hadoop with python over the streaming protocol? Any >>>> disadvantages (performance? flexibility?) ? It just strikes me that >>>> >>>> >>> python >>> >>> >>>> is so much more convenient when it comes to deploying and crunching text >>>> files. >>>> Thanks, >>>> >>>> >>>> >>> >> >> >
-
Re: Hadoop & PythonDan Milstein 2009-05-21, 12:19
One thing about the | sort | sh combiner.sh approach: you do have to
be careful about memory if you're doing that -- if a mapper instance sees a large number of rows, you'll be asking sort to sort *all* of those before passing them to the combiner. Hadoop itself only hands off some bounded number of output keys at a time to the combiner, which is much safer for large data sets. In dumbo itself, Klaas added "combine a chunk at a time", to address this problem. (and, yes, overall, getting combines fully supported in streaming is awesome) -D On May 19, 2009, at 5:04 PM, Peter Skomoroch wrote: > Whoops, should have googled it first. Looks like this is now fixed in > trunk, HADOOP-4842. For people stuck using 18.3, a workaround > appears to be > adding something like "| sort | sh combiner.sh" to the call of the > mapper > script (via Klaas Bosteels) > > Would be great to get this patched into distributions like EMR and > Cloudera > > On Tue, May 19, 2009 at 4:59 PM, Peter Skomoroch > <[EMAIL PROTECTED]>wrote: > >> One area I'm curious about is the requirement that any combiners in >> Streaming jobs be java classes. Are there any plans to change this >> in the >> future? Prototyping streaming jobs in Python is great, and the >> ability to >> use a Python combiner would help performance a lot without needing >> to move >> to Java. >> >> >> >> >> On Tue, May 19, 2009 at 4:30 PM, Amr Awadallah <[EMAIL PROTECTED]> >> wrote: >> >>> S d, >>> >>> It is totally fine to use Python streaming if it does the job you >>> are >>> after, there will be a slight performance hit, but that is noise >>> assuming >>> your cluster is a small one. If you are operating a large cluster >>> continuously, then once your logic is stabilized using Python it >>> might make >>> sense to convert/operationalize some jobs to Java (or C pipes) to >>> improve >>> performance for purpose of finishing quicker or reducing number of >>> servers >>> needed. >>> >>> You should also take a look at PIG and Hive, they are both higher >>> level >>> languages and very easy to learn: >>> >>> http://www.cloudera.com/hadoop-training-pig-introduction >>> >>> http://www.cloudera.com/hadoop-training-hive-introduction >>> >>> -- amr >>> >>> >>> s d wrote: >>> >>>> Thanks. >>>> So in the overall scheme of things, what is the general feeling >>>> about >>>> using >>>> python for this? I like the ease of deploying and reading python >>>> compared >>>> with Java but want to make sure using python over hadoop is >>>> scalable & is >>>> standard practice and not something done only for prototyping and >>>> small >>>> scale tests. >>>> >>>> >>>> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED] >>>> > >>>> wrote: >>>> >>>> >>>> >>>>> Streaming is slightly slower than native Java jobs. Otherwise >>>>> Python >>>>> works >>>>> great in streaming. >>>>> >>>>> Alex >>>>> >>>>> On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> How robust is using hadoop with python over the streaming >>>>>> protocol? Any >>>>>> disadvantages (performance? flexibility?) ? It just strikes me >>>>>> that >>>>>> >>>>>> >>>>> python >>>>> >>>>> >>>>>> is so much more convenient when it comes to deploying and >>>>>> crunching >>>>>> text >>>>>> files. >>>>>> Thanks, >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >> -- >> Peter N. Skomoroch >> 617.285.8348 >> http://www.datawrangling.com >> http://delicious.com/pskomoroch >> http://twitter.com/peteskomoroch >> > > > > -- > Peter N. Skomoroch > 617.285.8348 > http://www.datawrangling.com > http://delicious.com/pskomoroch > http://twitter.com/peteskomoroch
-
Re: Hadoop & PythonTodd Lipcon 2009-05-21, 17:22
On Thu, May 21, 2009 at 5:19 AM, Dan Milstein <[EMAIL PROTECTED]> wrote:
> One thing about the | sort | sh combiner.sh approach: you do have to be > careful about memory if you're doing that -- if a mapper instance sees a > large number of rows, you'll be asking sort to sort *all* of those before > passing them to the combiner. Hadoop itself only hands off some bounded > number of output keys at a time to the combiner, which is much safer for > large data sets. > The unix "sort" utility already does some smartness here. It has a configurable memory buffer it uses for sorting, and spills to /tmp by default. The manpage doesn't say what algorithm it's actually using, but I presume it's a mergesort. I think the default memory usage is something pretty small - you may get better performance using "sort -S 512M" or so. -Todd > > > On May 19, 2009, at 5:04 PM, Peter Skomoroch wrote: > > Whoops, should have googled it first. Looks like this is now fixed in >> trunk, HADOOP-4842. For people stuck using 18.3, a workaround appears to >> be >> adding something like "| sort | sh combiner.sh" to the call of the mapper >> script (via Klaas Bosteels) >> >> Would be great to get this patched into distributions like EMR and >> Cloudera >> >> On Tue, May 19, 2009 at 4:59 PM, Peter Skomoroch >> <[EMAIL PROTECTED]>wrote: >> >> One area I'm curious about is the requirement that any combiners in >>> Streaming jobs be java classes. Are there any plans to change this in >>> the >>> future? Prototyping streaming jobs in Python is great, and the ability >>> to >>> use a Python combiner would help performance a lot without needing to >>> move >>> to Java. >>> >>> >>> >>> >>> On Tue, May 19, 2009 at 4:30 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: >>> >>> S d, >>>> >>>> It is totally fine to use Python streaming if it does the job you are >>>> after, there will be a slight performance hit, but that is noise >>>> assuming >>>> your cluster is a small one. If you are operating a large cluster >>>> continuously, then once your logic is stabilized using Python it might >>>> make >>>> sense to convert/operationalize some jobs to Java (or C pipes) to >>>> improve >>>> performance for purpose of finishing quicker or reducing number of >>>> servers >>>> needed. >>>> >>>> You should also take a look at PIG and Hive, they are both higher level >>>> languages and very easy to learn: >>>> >>>> http://www.cloudera.com/hadoop-training-pig-introduction >>>> >>>> http://www.cloudera.com/hadoop-training-hive-introduction >>>> >>>> -- amr >>>> >>>> >>>> s d wrote: >>>> >>>> Thanks. >>>>> So in the overall scheme of things, what is the general feeling about >>>>> using >>>>> python for this? I like the ease of deploying and reading python >>>>> compared >>>>> with Java but want to make sure using python over hadoop is scalable & >>>>> is >>>>> standard practice and not something done only for prototyping and small >>>>> scale tests. >>>>> >>>>> >>>>> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> Streaming is slightly slower than native Java jobs. Otherwise Python >>>>>> works >>>>>> great in streaming. >>>>>> >>>>>> Alex >>>>>> >>>>>> On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi, >>>>>>> How robust is using hadoop with python over the streaming protocol? >>>>>>> Any >>>>>>> disadvantages (performance? flexibility?) ? It just strikes me that >>>>>>> >>>>>>> >>>>>>> python >>>>>> >>>>>> >>>>>> is so much more convenient when it comes to deploying and crunching >>>>>>> text >>>>>>> files. >>>>>>> Thanks, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> -- >>> Peter N. Skomoroch >>> 617.285.8348 >>> http://www.datawrangling.com >>> http://delicious.com/pskomoroch >>> http://twitter.com/peteskomoroch >>> >>> >> >> >> -- >> Peter N. Skomoroch >> 617.285.8348 >> http://www.datawrangling.com >> http://delicious.com/pskomoroch |