Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Read/Write from RDBMS using PIG/Hadoop


Copy link to this message
-
RE: Read/Write from RDBMS using PIG/Hadoop
Olga Natkovich 2009-05-04, 22:12
Hi Ted,

That's definitely an interesting insight. Thanks for sharing!

Olga

> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]]
> Sent: Monday, May 04, 2009 11:27 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Read/Write from RDBMS using PIG/Hadoop
>
> I have done this with other map-reduce programs with some
> interesting results that were predictable in hindsight:
>
> a) having mappers open database connections is a great way to
> take down your database.  Databases are not usually ready to
> handle the data volumes that map-reduce programs would like
> to take from them.
>
> b) having an output format that puts output from a map-reduce
> program directly into a database is not usually faster than
> producing output in flat files and using special data load
> commands to re-import into the database.
>
> The upshot is that exporting from the database in flat file
> format, processing using map-reduce and then re-importing
> flat files isn't all that bad an alternative.  I was hoping
> for a sexier solution, but the boring answer worked pretty well.
>
> On Mon, May 4, 2009 at 3:13 AM, Nellai
> <[EMAIL PROTECTED]>wrote:
>
> >
> > Is there a way we can use PIG to interact with RDBMS? Do we
> have any
> > API to handle such a scenario? Is there a way we can use
> hadoop's API
> > ( Hadoop
> > 0.19
> > DBInputFormat/DBOutputFormat) to interact with RDBMS  using PIG?
> >
> > Please let me know if someone has tried this.
> >
> > --
> Ted Dunning, CTO
> DeepDyve
>