Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: Some general questions about DBInputFormat


Copy link to this message
-
Re: Some general questions about DBInputFormat
Thanks for the fast response.
Nick, regarding locking a table: as far as I understood from the code, each
mapper opens its own connection to the DB. I didn't see any code such that
the job creates a transaction and passes it to the mapper. Did I
miss something?
again, thanks!
On Tue, Sep 11, 2012 at 4:00 PM, Nick Jones <[EMAIL PROTECTED]> wrote:

> Hi Yaron
>
> Replies inline below.
>
>
> On 09/11/2012 07:41 AM, Yaron Gonen wrote:
>
>> Hi,
>> After reviewing the class's (not very complicated) code, I have some
>> questions I hope someone can answer:
>>
>>   * (more general question) Are there many use-cases for using
>>
>>     DBInputFormat? Do most Hadoop jobs take their input from files or DBs?
>>
>>  Bejoy's right, most jobs utilize data across HDFS or some other
> distributed architecture to feed M/R at a sufficient rate. DBInputFormat
> could be helpful in pulling pointers to other sources of data (e.g. file
> paths for filers where actual binary content is stored).
>
>>
>>   * What happens when the database is updated during mappers' data
>>
>>     retrieval phase? is there a way to lock the database before the
>>     data retrieval phase and release it afterwords?
>>
>>  The whole job creates a transaction against the RBDMS that ensures
> consistent state throughout the job.  Depending on the source and settings,
> this might entirely lock a table or lock the selected rows by the query.
>
>>
>>   * Since all mappers open a connection to the same DBS, one cannot
>>
>>     use hundreds of mapper. Is there a solution to this problem?
>>
>>  Depends on the connection limits and the number of rows requested. I've
> found that the server suffered other problems first before connection count
> limitations.
>
>>
>> Thanks,
>> Yaron
>>
>
>
>