Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Best Practice: Acquire Data from external sources

Markus Resch 2012-03-16, 14:25
Copy link to this message
Re: Best Practice: Acquire Data from external sources
Markus - you might take a look at Sqoop; it's a package developed by
Cloudera to bridge the gap between Hadoop and relational databases.  Your
workflow might be to kick off a Sqoop job that retrieves from your RDBMS
and dumps the results in HDFS.  From there, you can use standard Pig to
process the data further.


On Fri, Mar 16, 2012 at 10:25 AM, Markus Resch <[EMAIL PROTECTED]>wrote:

> Hey everyone,
> thanks for answering my last question that quick, simple and completely.
> You guys are awesome!
> But to keep you exited I'll go one with my next question:
> I need to acquire additional information from external sources to
> process my data properly.
> My Idea was to do this by writing a dedicated data store which will
> perform e.g. some sql statements on some external data bases which might
> contain some results from former native pig results. The results from
> this external query could be stored onto the hadoop using the given
> default data stores and returned to the caller of LOAD as a common
> relation.
> My question about this is: Does this make sense? Especially from an
> optimization point of view?
> I'm curious about you opinions
> Thanks
> Markus