Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - best practice for Pig + MySql for meta data lookups

Copy link to this message
Re: best practice for Pig + MySql for meta data lookups
Bill Graham 2012-09-11, 15:33
That approach makes sense. We have similar situations where we pull
relation data into HDFS and then join/agg with it via MR. In other cases
we'll export aggregated HDFS data into a relational DB and then do
additional aggs using SQL. That option of course only works of your data
sizes are within reason.
On Tue, Sep 11, 2012 at 8:17 AM, William Oberman

> Hello,
> My setup is Pig + Hadoop + Cassandra for my "big data" and MySql for my
> "relational/meta data".  Up until now that has been fine, but now I need to
> start creating metrics that "cross the lines".  In particular, I need to
> create aggregations of Cassandra data based on lookups from MySql.
> After doing some research, it seems like my best option is using something
> like Sqoop to map the meta/relational data I need from MySql -> HDFS, and
> then use HDFS inside of Pig for the actual lookups.  I'd like to confirm
> that general strategy is correct (or any other tips).
> Thanks!
> will

*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*