Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Database insertion by HAdoop

Copy link to this message
Re: Database insertion by HAdoop
Nope HBase wasn't mentioned.
The OP could be talking about using external tables and Hive.

The OP could still be stuck in the RDBMs world and hasn't flattened his data yet.
2 million records? Kinda small dontcha think?
Not Enough Information ...

On Feb 18, 2013, at 8:58 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote:

> What database is this ? Was hbase mentioned ?
> On Monday, February 18, 2013, Mohammad Tariq wrote:
> Hello Masoud,
>           You can use the Bulk Load feature. You might find it more
> efficient than normal client APIs or using the TableOutputFormat.
> The bulk load feature uses a MapReduce job to output table data
> in HBase's internal data format, and then directly loads the
> generated StoreFiles into a running cluster. Using bulk load will use
> less CPU and network resources than simply using the HBase API.
> For a detailed info you can go here :
> http://hbase.apache.org/book/arch.bulk.load.html
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
> On Mon, Feb 18, 2013 at 5:00 PM, Masoud <[EMAIL PROTECTED]> wrote:
> Dear All,
> We are going to do our experiment of a scientific papers, ]
> We must insert data in our database for later consideration, it almost
> 300 tables each one has 2/000/000 records.
> as you know It takes lots of time to do it with a single machine,
> we are going to use our Hadoop cluster (32 machines) and divide 300
> insertion tasks between them,
> I need some hint to progress faster,
> 1- as i know we dont need to Reduser, just Mapper in enough.
> 2- so wee need just implement Mapper class with needed code.
> Please let me know if there is any point,
> Best Regards
> Masoud