Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Hadoop & Python


+
s d 2009-05-19, 15:36
+
Alex Loddengaard 2009-05-19, 16:48
+
s d 2009-05-19, 17:35
+
Amr Awadallah 2009-05-19, 20:30
+
Peter Skomoroch 2009-05-19, 20:59
+
Peter Skomoroch 2009-05-19, 21:04
+
Peter Skomoroch 2009-05-19, 21:04
+
Dan Milstein 2009-05-21, 12:19
+
Todd Lipcon 2009-05-21, 17:22
Copy link to this message
-
Re: Hadoop & Python
Thanks, What would be the # of severs , file sizes that in their range the
performance hit will be minor? I am concerned about implementing it all only
to rewrite it later to scale economically.
Thanks for all the information.

On Tue, May 19, 2009 at 1:30 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote:

> S d,
>
>  It is totally fine to use Python streaming if it does the job you are
> after, there will be a slight performance hit, but that is noise assuming
> your cluster is a small one. If you are operating a large cluster
> continuously, then once your logic is stabilized using Python it might make
> sense to convert/operationalize some jobs to Java (or C pipes) to improve
> performance for purpose of finishing quicker or reducing number of servers
> needed.
>
>  You should also take a look at PIG and Hive, they are both higher level
> languages and very easy to learn:
>
> http://www.cloudera.com/hadoop-training-pig-introduction
>
> http://www.cloudera.com/hadoop-training-hive-introduction
>
> -- amr
>
>
> s d wrote:
>
>> Thanks.
>> So in the overall scheme of things, what is the general feeling about
>> using
>> python for this? I like the ease of deploying and reading python compared
>> with Java but want to make sure using python over hadoop is scalable & is
>> standard practice and not something done only for prototyping and small
>> scale tests.
>>
>>
>> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]>
>> wrote:
>>
>>
>>
>>> Streaming is slightly slower than native Java jobs.  Otherwise Python
>>> works
>>> great in streaming.
>>>
>>> Alex
>>>
>>> On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>>
>>>> Hi,
>>>> How robust is using hadoop with python over the streaming protocol? Any
>>>> disadvantages (performance? flexibility?) ?  It just strikes me that
>>>>
>>>>
>>> python
>>>
>>>
>>>> is so much more convenient when it comes to deploying and crunching text
>>>> files.
>>>> Thanks,
>>>>
>>>>
>>>>
>>>
>>
>>
>
+
Billy Pearson 2009-05-19, 19:53
+
Alex Loddengaard 2009-05-20, 00:17
+
Zak Stone 2009-05-20, 00:31