Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - ExportSnapshot very slow. bug?


Copy link to this message
-
Re: ExportSnapshot very slow. bug?
Matteo Bertozzi 2013-11-08, 19:43
if you have a patch to parallelize that, feel free to post a patch and it
will probably be integrated.
The idea was to replace the multiple empty files with few small manifests
HBASE-7987.. but that work is still in progress.
so, feel free to post a patch with the fix.

Thanks!

Matteo

On Fri, Nov 8, 2013 at 11:19 AM, Bryan Beaudreault <[EMAIL PROTECTED]
> wrote:

> Ok thanks, I guess I am paying the cost of too many regions, which when
> multiplied by store files results in many thousand small files.  Is there
> any reason I couldn't modify this to parallelize it a little?
>
>
> On Fri, Nov 8, 2013 at 2:06 PM, Matteo Bertozzi <[EMAIL PROTECTED]
> >wrote:
>
> > The first copy doesn't resolve the links, so you're copying empty files.
> > The data copy is only on "step 2" with the MR job
> >
> > Matteo
> >
> >
> >
> > On Fri, Nov 8, 2013 at 10:54 AM, Bryan Beaudreault <
> > [EMAIL PROTECTED]
> > > wrote:
> >
> > > Hello all.  I'm trying out the ExportSnapshot tool and it is extremely
> > > slow.  I took a look at the code and I think I know why.
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh4-0.94.6_4.4.0/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#L635
> > >
> > > In step 1 it is for some reason copying from fs1 to fs2.  This
> basically
> > > means in a single threaded process we are copying an entire hbase table
> > to
> > > another cluster.  I can understand wanting to copy from fs1 to fs1
> (i.e.
> > > different path on same fs), so as to dereference all the soft links of
> > the
> > > snapshots.  But why between filesystems?
> > >
> > > In step 2 you finally do the MR job, which makes much more sense, but
> as
> > > far as I can tell all of the files would already exist, as
> FileUtils.copy
> > > just does a recursive copy of all paths in a tree.
> > >
> > > Am I missing something?  I appreciate any input.
> > >
> > > - Bryan
> > >
> >
>