Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Read/Write from RDBMS using PIG/Hadoop


Copy link to this message
-
RE: Read/Write from RDBMS using PIG/Hadoop
Hi Ted,

That's definitely an interesting insight. Thanks for sharing!

Olga

> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]]
> Sent: Monday, May 04, 2009 11:27 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Read/Write from RDBMS using PIG/Hadoop
>
> I have done this with other map-reduce programs with some
> interesting results that were predictable in hindsight:
>
> a) having mappers open database connections is a great way to
> take down your database.  Databases are not usually ready to
> handle the data volumes that map-reduce programs would like
> to take from them.
>
> b) having an output format that puts output from a map-reduce
> program directly into a database is not usually faster than
> producing output in flat files and using special data load
> commands to re-import into the database.
>
> The upshot is that exporting from the database in flat file
> format, processing using map-reduce and then re-importing
> flat files isn't all that bad an alternative.  I was hoping
> for a sexier solution, but the boring answer worked pretty well.
>
> On Mon, May 4, 2009 at 3:13 AM, Nellai
> <[EMAIL PROTECTED]>wrote:
>
> >
> > Is there a way we can use PIG to interact with RDBMS? Do we
> have any
> > API to handle such a scenario? Is there a way we can use
> hadoop's API
> > ( Hadoop
> > 0.19
> > DBInputFormat/DBOutputFormat) to interact with RDBMS  using PIG?
> >
> > Please let me know if someone has tried this.
> >
> > --
> Ted Dunning, CTO
> DeepDyve
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB