Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Writing an iterator that calculates on compaction


Copy link to this message
-
Writing an iterator that calculates on compaction
Folks,

I am trying to get organized to get my feet wet in using the ability
of accumulo to compute near the data. I beg your pardon in advance for
the following exercise in laying  out what I have in mind and asking
for some pointers -- particularly to examples on the 1.4 branch of
code that I could warp to achieve my nefarious purposes.

So, start with this data model:
  ROWID   CF          CQ            V
  itemid  'context'   dimension     value
  itemid  something   else          entirely...

In short, for an 'item', there's a sparse feature vector associated
with it (identified by cf='context'), and some other things.

Meanwhile, in another table we have:

  clusterid  'items'  itemid1       -blank-
  clusterid  'items'  itemid2       -blank-
In other words, a cluster is a grouping of the items from the first
group, identified by their rowids.

My initial test of my ability to find my way around a brightly lit
room with a flashlight is to calculate the centrolds of these
clusters, and store them as an additional CF:

    CF='centroid' CQ=dimension V=value

And the my second test is to calculate the distance from each item to
the centroid of it's cluster, and store that. Finally, I want to
peruse items in descending order of their distance-from-centroid
values.

TIA
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB