Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Map Reduce on accumulo


Copy link to this message
-
Map Reduce on accumulo
NOTE: I am fairly sure this hasn't been asked on here yet - my apologies if
it was already asked in which case please forward me a link to the
answers.Thank you.

If my environment set up is as follows:
-64MB HDFS block
-5 tablet servers
-10 tablets of size 1GB each per tablet server

If I have a table like below:
rowA | f1 | q1 | v1
rowA | f1 | q2 | v2

rowB | f1 | q1 | v3

rowC | f1 | q1 | v4
rowC | f2 | q1 | v5
rowC | f3 | q3 | v6

>From the little documentation, I know all data about rowA will go one
tablet which may or may not contain data about other rows ie its all or
none. So my questions are:

How are the tablets mapped to a Datanode or HDFS block? Obviously, One
tablet is split into multiple HDFS blocks (8 in this case) so would they be
stored on the same or different datanode(s) or does it not matter?

In the example above, would all data about RowC (or A or B) go onto the
same HDFS block or different HDFS blocks?

When executing a map reduce job how many mappers would I get? (one per hdfs
block? or per tablet? or per server?)

Thank you in advance for any and all suggestions.
+
John Vines 2012-12-04, 22:45
+
Aji Janis 2012-12-04, 23:55
+
John Vines 2012-12-05, 01:36
+
Aji Janis 2012-12-06, 22:32
+
Billie Rinaldi 2012-12-07, 14:51
+
Aji Janis 2012-12-07, 14:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB