Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: Database insertion by HAdoop


+
Hemanth Yamijala 2013-02-19, 01:14
+
Mohammad Tariq 2013-02-19, 09:41
Copy link to this message
-
Re: Database insertion by HAdoop
Masoud 2013-02-19, 11:04
Dear Tariq

No, exactly in opposite way, actually we compute the similarity between
documents and insert them in database, in every table almost 2/000/000
records.

Best Regards

On 02/19/2013 06:41 PM, Mohammad Tariq wrote:
> Hello Masoud,
>
>       So you want to pull your data from SQL server to your Hadoop
> cluster first and then do the processing. Please correct me if I am
> wrong. You can do that using Sqoop as mention by Hemanth sir. BTW,
> what exactly is the kind of processing which you are planning to do on
> your data.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com <http://cloudfront.blogspot.com>
>
>
> On Tue, Feb 19, 2013 at 6:44 AM, Hemanth Yamijala
> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Hi,
>
>     You could consider using sqoop. http://sqoop.apache.org/ there
>     seemed to be a SQL connector from Microsoft.
>     http://www.microsoft.com/en-gb/download/details.aspx?id=27584
>
>     Thanks
>     Hemanth
>
>     On Tuesday, February 19, 2013, Masoud wrote:
>
>         Hello Tariq,
>
>         Our database is sql server 2008,
>         and we dont need to develop a professional app, we just need
>         to develop it fast and make our experiment result soon.
>         Thanks
>
>
>         On 02/18/2013 11:58 PM, Hemanth Yamijala wrote:
>>         What database is this ? Was hbase mentioned ?
>>
>>         On Monday, February 18, 2013, Mohammad Tariq wrote:
>>
>>             Hello Masoud,
>>                       You can use the Bulk Load feature. You might
>>             find it more
>>             efficient than normal client APIs or using
>>             the TableOutputFormat.
>>
>>             The bulk load feature uses a MapReduce job to output
>>             table data
>>             in HBase's internal data format, and then directly loads the
>>             generated StoreFiles into a running cluster. Using bulk
>>             load will use
>>             less CPU and network resources than simply using the
>>             HBase API.
>>
>>             For a detailed info you can go here :
>>             http://hbase.apache.org/book/arch.bulk.load.html
>>
>>             Warm Regards,
>>             Tariq
>>             https://mtariq.jux.com/
>>             cloudfront.blogspot.com <http://cloudfront.blogspot.com>
>>
>>
>>             On Mon, Feb 18, 2013 at 5:00 PM, Masoud
>>             <[EMAIL PROTECTED]> wrote:
>>
>>
>>                 Dear All,
>>
>>                 We are going to do our experiment of a scientific
>>                 papers, ]
>>                 We must insert data in our database for later
>>                 consideration, it almost
>>                 300 tables each one has 2/000/000 records.
>>                 as you know It takes lots of time to do it with a
>>                 single machine,
>>                 we are going to use our Hadoop cluster (32 machines)
>>                 and divide 300
>>                 insertion tasks between them,
>>                 I need some hint to progress faster,
>>                 1- as i know we dont need to Reduser, just Mapper in
>>                 enough.
>>                 2- so wee need just implement Mapper class with
>>                 needed code.
>>
>>                 Please let me know if there is any point,
>>
>>                 Best Regards
>>                 Masoud
>>
>>
>>
>>
>
>
>         --
>         Masoud Reyhani Hamedani
>         Ph.D. Candidate
>         Department of Electronics and Computer Engineering, Hanyang University
>         Data Mining and Knowledge Engineering Lab,
>         Room 803 IT/BT Building 17
>         Haengdang-dong, Sungdong-gu Seoul, Republic of Korea, 133-791
>         Tel: +82-2-2220-4567
>         [EMAIL PROTECTED]
>         http://agape.hanyang.ac.kr
>
>
--
Masoud Reyhani Hamedani
Ph.D. Candidate
Department of Electronics and Computer Engineering, Hanyang University
Data Mining and Knowledge Engineering Lab,
Room 803 IT/BT Building 17
Haengdang-dong, Sungdong-gu Seoul, Republic of Korea, 133-791
Tel: +82-2-2220-4567
[EMAIL PROTECTED]
http://agape.hanyang.ac.kr
+
Hemanth Yamijala 2013-02-19, 15:52