Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> ExportSnapshot very slow. bug?


Copy link to this message
-
Re: ExportSnapshot very slow. bug?
if you have a patch to parallelize that, feel free to post a patch and it
will probably be integrated.
The idea was to replace the multiple empty files with few small manifests
HBASE-7987.. but that work is still in progress.
so, feel free to post a patch with the fix.

Thanks!

Matteo

On Fri, Nov 8, 2013 at 11:19 AM, Bryan Beaudreault <[EMAIL PROTECTED]
> wrote:

> Ok thanks, I guess I am paying the cost of too many regions, which when
> multiplied by store files results in many thousand small files.  Is there
> any reason I couldn't modify this to parallelize it a little?
>
>
> On Fri, Nov 8, 2013 at 2:06 PM, Matteo Bertozzi <[EMAIL PROTECTED]
> >wrote:
>
> > The first copy doesn't resolve the links, so you're copying empty files.
> > The data copy is only on "step 2" with the MR job
> >
> > Matteo
> >
> >
> >
> > On Fri, Nov 8, 2013 at 10:54 AM, Bryan Beaudreault <
> > [EMAIL PROTECTED]
> > > wrote:
> >
> > > Hello all.  I'm trying out the ExportSnapshot tool and it is extremely
> > > slow.  I took a look at the code and I think I know why.
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh4-0.94.6_4.4.0/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#L635
> > >
> > > In step 1 it is for some reason copying from fs1 to fs2.  This
> basically
> > > means in a single threaded process we are copying an entire hbase table
> > to
> > > another cluster.  I can understand wanting to copy from fs1 to fs1
> (i.e.
> > > different path on same fs), so as to dereference all the soft links of
> > the
> > > snapshots.  But why between filesystems?
> > >
> > > In step 2 you finally do the MR job, which makes much more sense, but
> as
> > > far as I can tell all of the files would already exist, as
> FileUtils.copy
> > > just does a recursive copy of all paths in a tree.
> > >
> > > Am I missing something?  I appreciate any input.
> > >
> > > - Bryan
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB