Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: Some general questions about DBInputFormat


Copy link to this message
-
Re: Some general questions about DBInputFormat
Thanks for the fast response.
Nick, regarding locking a table: as far as I understood from the code, each
mapper opens its own connection to the DB. I didn't see any code such that
the job creates a transaction and passes it to the mapper. Did I
miss something?
again, thanks!
On Tue, Sep 11, 2012 at 4:00 PM, Nick Jones <[EMAIL PROTECTED]> wrote:

> Hi Yaron
>
> Replies inline below.
>
>
> On 09/11/2012 07:41 AM, Yaron Gonen wrote:
>
>> Hi,
>> After reviewing the class's (not very complicated) code, I have some
>> questions I hope someone can answer:
>>
>>   * (more general question) Are there many use-cases for using
>>
>>     DBInputFormat? Do most Hadoop jobs take their input from files or DBs?
>>
>>  Bejoy's right, most jobs utilize data across HDFS or some other
> distributed architecture to feed M/R at a sufficient rate. DBInputFormat
> could be helpful in pulling pointers to other sources of data (e.g. file
> paths for filers where actual binary content is stored).
>
>>
>>   * What happens when the database is updated during mappers' data
>>
>>     retrieval phase? is there a way to lock the database before the
>>     data retrieval phase and release it afterwords?
>>
>>  The whole job creates a transaction against the RBDMS that ensures
> consistent state throughout the job.  Depending on the source and settings,
> this might entirely lock a table or lock the selected rows by the query.
>
>>
>>   * Since all mappers open a connection to the same DBS, one cannot
>>
>>     use hundreds of mapper. Is there a solution to this problem?
>>
>>  Depends on the connection limits and the number of rows requested. I've
> found that the server suffered other problems first before connection count
> limitations.
>
>>
>> Thanks,
>> Yaron
>>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB