Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> [How to optimize MapReduce performance using Pig]


Copy link to this message
-
[How to optimize MapReduce performance using Pig]
Thank you for the time  to read my mail.

I've been in a simple reserach about Big Data since three months ago.
I'm using Pig now to do the Map and reduce computation over my data.
I have about 50million records in HBase, and I want to process that all
with Pig.
Unfortunately, the time that Pig took to process that much data tend to be
so long, it was about almost two and half hour, just for a simply query
(data retrieval).
When I compare it with RDBMS, the difference is so significant.
I have to finish my research in no more than a week (it's about 5 days
more).

The problem is, I want to make it looks that Hadoop and his friends (HBase
and Pig) are a good solution to process huge amount of structured data
(since the data thatr i want to process is structured).
How can I tune the overall performance then?

Thank you very much for the attention and any kind of feedback.
:)

Regards,

Florencia.
(Student in High School of Statistics, Department of Computational
Statistic).
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB