Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - ExportSnapshot very slow. bug?


Copy link to this message
-
Re: ExportSnapshot very slow. bug?
Bryan Beaudreault 2013-11-08, 19:19
Ok thanks, I guess I am paying the cost of too many regions, which when
multiplied by store files results in many thousand small files.  Is there
any reason I couldn't modify this to parallelize it a little?
On Fri, Nov 8, 2013 at 2:06 PM, Matteo Bertozzi <[EMAIL PROTECTED]>wrote:

> The first copy doesn't resolve the links, so you're copying empty files.
> The data copy is only on "step 2" with the MR job
>
> Matteo
>
>
>
> On Fri, Nov 8, 2013 at 10:54 AM, Bryan Beaudreault <
> [EMAIL PROTECTED]
> > wrote:
>
> > Hello all.  I'm trying out the ExportSnapshot tool and it is extremely
> > slow.  I took a look at the code and I think I know why.
> >
> >
> >
> https://github.com/cloudera/hbase/blob/cdh4-0.94.6_4.4.0/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#L635
> >
> > In step 1 it is for some reason copying from fs1 to fs2.  This basically
> > means in a single threaded process we are copying an entire hbase table
> to
> > another cluster.  I can understand wanting to copy from fs1 to fs1 (i.e.
> > different path on same fs), so as to dereference all the soft links of
> the
> > snapshots.  But why between filesystems?
> >
> > In step 2 you finally do the MR job, which makes much more sense, but as
> > far as I can tell all of the files would already exist, as FileUtils.copy
> > just does a recursive copy of all paths in a tree.
> >
> > Am I missing something?  I appreciate any input.
> >
> > - Bryan
> >
>