Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: importing a large table


Copy link to this message
-
Re: importing a large table
Rita 2012-03-31, 10:33
I have close to 9200 regions. Is there an example I can follow? or are
there tools to do this already?

On Fri, Mar 30, 2012 at 10:11 AM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:

>
>
> On 03/30/2012 04:54 AM, Rita wrote:
>
> Thanks for the responses. I am using 0.90.4-cdh3. i exported the table
> using hbase exporter. Yes, the previous table still exists but on a
> different cluster.My region servers are large, close to 12GB in size.
>
>  Which is the total number of your regions?
>
>  I want to understand regarding Hfiles. We export the table as a series of
> Hfiles and then import them in?
>
>  Yes, The simplest way to do this is using the TableOutputFormat, but if
> you use instead the HFileOutputFormat, the process will be more efficient,
> because using this feature (bulk loads) will use less CPU and network. With
> a MapReduce job, you prepare your data using the HFileOutputFormat
> (Hadoop's TotalOrderPartitioner class in used to partition the map output
> into disjoint ranges of the key space, corresponding to the key ranges of
> the regions in the table).
>
>
>  What is the difference between that in the
> regular MR export job?
>
>  The main difference with regular MR jobs is the output, instead to use
> the classic ouput formats like TextOutputFormat, MultipleOutputFormat,
> SequenceFileOutputFormat, etc, you will use the HFileOutputFormat, that is
> the native data file type for HBase (HFile).
>
>   I idea sounds good because it sounds simple on the
> surface :-)
>
>
>
>
> On Fri, Mar 30, 2012 at 12:08 AM, Stack <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote:
>
>
>  On Thu, Mar 29, 2012 at 7:57 PM, Rita <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote:
>
>  Hello,
>
> I am importing a 40+ billion row table which I exported several months
>
>  ago.
>
>  The data size is close to 18TB on hdfs (3x replication).
>
>
>  Does the table from back then still exist?  Or do you remember what
> the key spread was like?  Could you precreate the old table?
>
>
>  My problem is when I try to import it with mapreduce it takes a few days
>
>  --
>
>  which is ok -- however when the job fails to whatever reason, I have to
> restart everything. Is it possible to import the table in chunks like,
> import 1/3, 2/3, and then finally 3/3  of the table?
>
>
>  Yeah.  Funny how the plug gets pulled on the rack when the three day
> job is at the end 95% done.
>
>
>  Btw, the jobs creates close to 150k mapper jobs, thats a problem waiting
>
>  to
>
>  happen :-)
>
>
>  Are you running 0.92?  If not, you should and go for bigger regions.   10G?
>
> St.Ack
>
>
>
> --
> Marcos Luis Ortíz Valmaseda (@marcosluis2186)
>  Data Engineer at UCI
>  http://marcosluis2186.posterous.com
>
>
>   <http://www.uci.cu/>
>
>
--
--- Get your facts first, then you can distort them as you please.--