Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop & Python


Copy link to this message
-
Re: Hadoop & Python
Thanks, What would be the # of severs , file sizes that in their range the
performance hit will be minor? I am concerned about implementing it all only
to rewrite it later to scale economically.
Thanks for all the information.

On Tue, May 19, 2009 at 1:30 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote:

> S d,
>
>  It is totally fine to use Python streaming if it does the job you are
> after, there will be a slight performance hit, but that is noise assuming
> your cluster is a small one. If you are operating a large cluster
> continuously, then once your logic is stabilized using Python it might make
> sense to convert/operationalize some jobs to Java (or C pipes) to improve
> performance for purpose of finishing quicker or reducing number of servers
> needed.
>
>  You should also take a look at PIG and Hive, they are both higher level
> languages and very easy to learn:
>
> http://www.cloudera.com/hadoop-training-pig-introduction
>
> http://www.cloudera.com/hadoop-training-hive-introduction
>
> -- amr
>
>
> s d wrote:
>
>> Thanks.
>> So in the overall scheme of things, what is the general feeling about
>> using
>> python for this? I like the ease of deploying and reading python compared
>> with Java but want to make sure using python over hadoop is scalable & is
>> standard practice and not something done only for prototyping and small
>> scale tests.
>>
>>
>> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <[EMAIL PROTECTED]>
>> wrote:
>>
>>
>>
>>> Streaming is slightly slower than native Java jobs.  Otherwise Python
>>> works
>>> great in streaming.
>>>
>>> Alex
>>>
>>> On Tue, May 19, 2009 at 8:36 AM, s d <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>>
>>>> Hi,
>>>> How robust is using hadoop with python over the streaming protocol? Any
>>>> disadvantages (performance? flexibility?) ?  It just strikes me that
>>>>
>>>>
>>> python
>>>
>>>
>>>> is so much more convenient when it comes to deploying and crunching text
>>>> files.
>>>> Thanks,
>>>>
>>>>
>>>>
>>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB