|
|
-
Re: Database insertion by HAdoopMasoud 2013-02-19, 11:04
Dear Tariq
No, exactly in opposite way, actually we compute the similarity between documents and insert them in database, in every table almost 2/000/000 records. Best Regards On 02/19/2013 06:41 PM, Mohammad Tariq wrote: > Hello Masoud, > > So you want to pull your data from SQL server to your Hadoop > cluster first and then do the processing. Please correct me if I am > wrong. You can do that using Sqoop as mention by Hemanth sir. BTW, > what exactly is the kind of processing which you are planning to do on > your data. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com <http://cloudfront.blogspot.com> > > > On Tue, Feb 19, 2013 at 6:44 AM, Hemanth Yamijala > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: > > Hi, > > You could consider using sqoop. http://sqoop.apache.org/ there > seemed to be a SQL connector from Microsoft. > http://www.microsoft.com/en-gb/download/details.aspx?id=27584 > > Thanks > Hemanth > > On Tuesday, February 19, 2013, Masoud wrote: > > Hello Tariq, > > Our database is sql server 2008, > and we dont need to develop a professional app, we just need > to develop it fast and make our experiment result soon. > Thanks > > > On 02/18/2013 11:58 PM, Hemanth Yamijala wrote: >> What database is this ? Was hbase mentioned ? >> >> On Monday, February 18, 2013, Mohammad Tariq wrote: >> >> Hello Masoud, >> You can use the Bulk Load feature. You might >> find it more >> efficient than normal client APIs or using >> the TableOutputFormat. >> >> The bulk load feature uses a MapReduce job to output >> table data >> in HBase's internal data format, and then directly loads the >> generated StoreFiles into a running cluster. Using bulk >> load will use >> less CPU and network resources than simply using the >> HBase API. >> >> For a detailed info you can go here : >> http://hbase.apache.org/book/arch.bulk.load.html >> >> Warm Regards, >> Tariq >> https://mtariq.jux.com/ >> cloudfront.blogspot.com <http://cloudfront.blogspot.com> >> >> >> On Mon, Feb 18, 2013 at 5:00 PM, Masoud >> <[EMAIL PROTECTED]> wrote: >> >> >> Dear All, >> >> We are going to do our experiment of a scientific >> papers, ] >> We must insert data in our database for later >> consideration, it almost >> 300 tables each one has 2/000/000 records. >> as you know It takes lots of time to do it with a >> single machine, >> we are going to use our Hadoop cluster (32 machines) >> and divide 300 >> insertion tasks between them, >> I need some hint to progress faster, >> 1- as i know we dont need to Reduser, just Mapper in >> enough. >> 2- so wee need just implement Mapper class with >> needed code. >> >> Please let me know if there is any point, >> >> Best Regards >> Masoud >> >> >> >> > > > -- > Masoud Reyhani Hamedani > Ph.D. Candidate > Department of Electronics and Computer Engineering, Hanyang University > Data Mining and Knowledge Engineering Lab, > Room 803 IT/BT Building 17 > Haengdang-dong, Sungdong-gu Seoul, Republic of Korea, 133-791 > Tel: +82-2-2220-4567 > [EMAIL PROTECTED] > http://agape.hanyang.ac.kr > > -- Masoud Reyhani Hamedani Ph.D. Candidate Department of Electronics and Computer Engineering, Hanyang University Data Mining and Knowledge Engineering Lab, Room 803 IT/BT Building 17 Haengdang-dong, Sungdong-gu Seoul, Republic of Korea, 133-791 Tel: +82-2-2220-4567 [EMAIL PROTECTED] http://agape.hanyang.ac.kr |