Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> [How to optimize MapReduce performance using Pig]


Copy link to this message
-
Re: [How to optimize MapReduce performance using Pig]
Hi Florencia,

Welcome to Pig!

Unfortunately without knowing the actual script that you're trying to
execute, we won't be able to help you with optimizations. There are some
very general guidelines for optimizing Pig scripts though.

Take a look at http://chimera.labs.oreilly.com/books/1234000001811/ch08.html.
There are some general guidelines on writing Pig scripts as well as some
Hadoop settings that can be tweaked for better performance.

If you're able to post your script(s) to the mailing list, we can certainly
take a look and help you optimize it.

Hope this helps!
Pradeep

P.S. This question should really be posted to the user mailing list, not
the dev list. :)
On Fri, Sep 6, 2013 at 12:25 AM, Florencia Satwika <[EMAIL PROTECTED]>wrote:

> Thank you for the time  to read my mail.
>
> I've been in a simple reserach about Big Data since three months ago.
> I'm using Pig now to do the Map and reduce computation over my data.
> I have about 50million records in HBase, and I want to process that all
> with Pig.
> Unfortunately, the time that Pig took to process that much data tend to be
> so long, it was about almost two and half hour, just for a simply query
> (data retrieval).
> When I compare it with RDBMS, the difference is so significant.
> I have to finish my research in no more than a week (it's about 5 days
> more).
>
> The problem is, I want to make it looks that Hadoop and his friends (HBase
> and Pig) are a good solution to process huge amount of structured data
> (since the data thatr i want to process is structured).
> How can I tune the overall performance then?
>
> Thank you very much for the attention and any kind of feedback.
> :)
>
> Regards,
>
> Florencia.
> (Student in High School of Statistics, Department of Computational
> Statistic).
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB