William Oberman 2012-09-11, 15:17
That approach makes sense. We have similar situations where we pull
relation data into HDFS and then join/agg with it via MR. In other cases
we'll export aggregated HDFS data into a relational DB and then do
additional aggs using SQL. That option of course only works of your data
sizes are within reason.
On Tue, Sep 11, 2012 at 8:17 AM, William Oberman
> My setup is Pig + Hadoop + Cassandra for my "big data" and MySql for my
> "relational/meta data". Up until now that has been fine, but now I need to
> start creating metrics that "cross the lines". In particular, I need to
> create aggregations of Cassandra data based on lookups from MySql.
> After doing some research, it seems like my best option is using something
> like Sqoop to map the meta/relational data I need from MySql -> HDFS, and
> then use HDFS inside of Pig for the actual lookups. I'd like to confirm
> that general strategy is correct (or any other tips).
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*
William Oberman 2012-09-11, 15:54
Bill Graham 2012-09-11, 16:58
William Oberman 2012-09-11, 18:09
William Oberman 2012-09-12, 14:41