Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - A question about HBase MapReduce


Copy link to this message
-
Re: A question about HBase MapReduce
Doug Meil 2012-05-25, 17:16

re:  "data from raw data file into hbase table"

One approach is bulk loading..

http://hbase.apache.org/book.html#arch.bulk.load

If he's talking about using an Hbase table as the source of a MR job, then
see this...
http://hbase.apache.org/book.html#splitter
On 5/25/12 2:35 AM, "Florin P" <[EMAIL PROTECTED]> wrote:

>Hello!
>
>I've read Lars George's blog
>http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html where
>at the end of the article, he mentioned "In the next post I will show you
>how to import data from a raw data
>file into a HBase table and how you eventually process the data in the
>HBase table. We will address questions like how many mappers and/or
>reducers are needed and how can I improve import and processing
>performance.". I looked in the blog up for these questions, but it seems
>that there is no article related. Do you knoe if he you touched these
>subjects into a different post or book? Particular I am interested
>
>1. how you can set up the number of mappers?
>2. number of mappers can be set up per region server? If yes how?
>3. How the big number of set up mappers can affect the data locality?
>4. is this algorithm for computing the number of mappers
>(https://issues.apache.org/jira/browse/HBASE-1172) still available
>"Currently,
>the number of mappers specified when using TableInputFormat is strictly
>followed if less than total regions on the input table. If greater, the
>number of regions is used.
>This will modify the splitting algorithm to do the following:
> * Specify 0 mappers when you want # mappers = # regions
> * If you specify fewer mappers than regions, will use exactly the number
>you specify based on the current algorithm
> * If
>you specify more mappers than regions, will divide regions up by
>determining [start,X) [X,end). The number of mappers will always be a
>multiple of number of regions. This is so we do not have scanners
>spanning multiple regions.
>There is an additional issue in that the default number of mappers
>in JobConf is set to 1. That means if a user does not explicitly set
>number of map tasks, a single mapper will be used. "
>
>I'll look forward for you answers. Thank you.
>
>Kind regards, Florin