Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Using Pig for a comparative Study


Copy link to this message
-
RE: Using Pig for a comparative Study
Santhosh Srinivasan 2009-10-07, 16:44
Rob,

>> 2. To ask, if there is any other solution out there that can be
closely compared to the functionality and use of Pig.

Hive (http://hadoop.apache.org/hive/) provides a SQL interface on top of
Hadoop and JAQL (http://www.jaql.org/), another query language which
also works on Hadoop are two good candidates.

>> 4. Am I off the mark with these questions? If so, please speak now!

Not at all. It will be great if you could share the parameters that form
the basis for the comparison.

Thanks,
Santhosh

-----Original Message-----
From: Rob Stewart [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 07, 2009 6:18 AM
To: [EMAIL PROTECTED]
Subject: Using Pig for a comparative Study

Hello Pig user group !

OK, here's two things about me:
1. I'm new to Pig and Hadoop
2. I'm studying for a Masters in Software Engineering in the UK.
3. I'm looking to do a comparitive study on probably two distributed
systems over a cluster network. I have investigated Hadoop, and have
deployed Hadoop across various virtual Linux systems on this PC I'm
using (which was fun!), and my university has given me permission to use
the cluster at university to deploy Hadoop, which I'm excited about.
(They may even use it for future research, or better still, production
processing!).

Anyway... I have had a look at Pig, and have worked through the various
tutorials, which are very well written, and have these tutorials working
on my virtual Hadoop cluster here on this PC, and I assume the same
would be the case on the university cluster.

I am needing another system, as similar as possible to the function and
use of Pig. My supervisor has pointed me in the direction of CouchDB
(written in
Erlang) as another tool which potentially could be used for comparison
for my studies. Reading a little about it, there seems no formal process
for distributing a CouchDB job however, across a cluster of nodes for
parallel processing. I have contacted the CouchDB mailing list for
clarification about this however.

So, I write to you guys for four reasons:
1. To touch base, and say - "hey, I'm hoping to use Pig for a
comparitive study for my Masters dissertation - Thanks !!"
2. To ask, if there is any other solution out there that can be closely
compared to the functionality and use of Pig.
3. If CouchDB has been benchmarked against Pig before now, where I can
find it, or who can help me with this.
4. Am I off the mark with these questions? If so, please speak now!
thanks,

Rob Stewart