|
|
+
William Oberman 2012-09-11, 15:17
+
Bill Graham 2012-09-11, 15:33
+
William Oberman 2012-09-11, 15:54
+
Bill Graham 2012-09-11, 16:58
-
Re: best practice for Pig + MySql for meta data lookupsWilliam Oberman 2012-09-11, 18:09
Thanks (again)!
I'm already using CassandraStorage to load the JSON strings. I used Maps because I liked being able to name the fields, but I could easily change my UDF (and my Pig script) to use tuples instead. Maybe this is because I found Pig (and Hadoop) coming from the world of Cassandra rather than vice versa. I'll look into Join and Cogroup more, and I'll see if I can puzzle through how to load Sqoop persisted data into Pig. will On Tue, Sep 11, 2012 at 12:58 PM, Bill Graham <[EMAIL PROTECTED]> wrote: > Instead of UDFs and Maps, try to work with LoadFuncs and Tuples if you can. > For example you could read from Cassandra with using CassangraStorage[1] > and produce a Tuple of objects. If your data is JSON in Cassandra you could > use a UDF to convert that to Tuples. Then you can then join or cogroup > those tuples with others that you've imported from the DB. > > 1 - I've never used this: > > http://svn.apache.org/repos/asf/cassandra/trunk/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java > > On Tue, Sep 11, 2012 at 8:54 AM, William Oberman > <[EMAIL PROTECTED]>wrote: > > > Great news (for me)! :-) My relational data is small (both relative to > the > > big data, but also absolutely). > > > > I'm reading about Sqoop now, and it seems relatively straight forward. > > > > My current problem is not having done this kind of combining of data > before > > in MR (which for me means Pig). Right now I have to pipe my Cassandra > data > > through a UDF, as the data itself is JSON (and I map it to a Map of well > > defined fields). I was originally thinking I could just add a new field > to > > my Map in the UDF, but I don't know how to read from HDFS in a UDF (and > > even if I knew how to read HDFS, I don't know how to read data produced > by > > Sqoop stored in HDFS). > > > > Now I'm wondering if this is the wrong mental model entirely. I haven't > > figured out the details (obviously!), but it seems possible that using > Pig > > itself (without resorting to UDFs) I could > > -load my Cassandra data > > -load my HDFS data > > -combine them > > But, I'm puzzling on the how for the 2nd and 3rd items. > > > > It's hard to get specific without getting *really* specific, but all of > the > > new problems I have seem to boil down to something like: > > 1.) Inside Pig I have a Map that contains a field with value X > > 2.) I have meta data in MySql that maps that X to a more general > grouping Y > > 3.) I want to create reporting data based on both X and Y > > The goal being to see how Y is doing overall, and how each X_i of Y are > > doing relative to each other.... > > > > will > > > > > > On Tue, Sep 11, 2012 at 11:33 AM, Bill Graham <[EMAIL PROTECTED]> > > wrote: > > > > > That approach makes sense. We have similar situations where we pull > > > relation data into HDFS and then join/agg with it via MR. In other > cases > > > we'll export aggregated HDFS data into a relational DB and then do > > > additional aggs using SQL. That option of course only works of your > data > > > sizes are within reason. > > > > > > > > > On Tue, Sep 11, 2012 at 8:17 AM, William Oberman > > > <[EMAIL PROTECTED]>wrote: > > > > > > > Hello, > > > > > > > > My setup is Pig + Hadoop + Cassandra for my "big data" and MySql for > my > > > > "relational/meta data". Up until now that has been fine, but now I > > need > > > to > > > > start creating metrics that "cross the lines". In particular, I need > > to > > > > create aggregations of Cassandra data based on lookups from MySql. > > > > > > > > After doing some research, it seems like my best option is using > > > something > > > > like Sqoop to map the meta/relational data I need from MySql -> HDFS, > > and > > > > then use HDFS inside of Pig for the actual lookups. I'd like to > > confirm > > > > that general strategy is correct (or any other tips). > > > > > > > > Thanks! > > > > > > > > will > > > > > > > > > > > > > > > > -- > > > *Note that I'm no longer using my Yahoo! email address. Please email me +
William Oberman 2012-09-12, 14:41
|