Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Wikisearch Performance Question


Copy link to this message
-
Re: Wikisearch Performance Question
What size cluster, and what is the HDFS block size, compared to the
file sizes? I'm wondering if the blocks for the large file were
disproportionately burdening a small number of datanodes, when the
small files were more evenly distributed.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Tue, May 21, 2013 at 1:30 PM, Patrick Lynch <[EMAIL PROTECTED]> wrote:
> user@accumulo,
>
> I was working with the Wikipedia Accumulo ingest examples, and I was trying
> to get the ingest of a single archive file to be as fast as ingesting
> multiple archives through parallelization. I increased the number of ways
> the job split the single archive so that all the servers could work on
> ingesting at the same time. What I noticed, however, was that having all the
> servers work on ingesting the same file was still not nearly as fast as
> using multiple ingest files. I was wondering if I could have some insight
> into the design of the Wikipedia ingest that could explain this phenomenon.
>
> Thank you for your time,
> Patrick Lynch
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB