Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> schema examples


+
Arshak Navruzyan 2013-12-26, 20:10
+
Jeremy Kepner 2013-12-27, 01:33
+
Jeremy Kepner 2013-12-27, 01:31
+
Arshak Navruzyan 2013-12-28, 01:01
+
Kepner, Jeremy - 0553 - M... 2013-12-28, 18:36
+
Arshak Navruzyan 2013-12-29, 16:34
+
Kepner, Jeremy - 0553 - M... 2013-12-29, 16:42
+
Arshak Navruzyan 2013-12-29, 16:57
FYI, we just insert all the triples into both Tedge and TedgeTranspose using seperate batchwriters and let Accumulo figure out which ones belong in the same row. This has worked well for us.

On Dec 29, 2013, at 11:57 AM, Arshak Navruzyan <[EMAIL PROTECTED]> wrote:

> Sorry I mixed things up.  It was in the wikisearch example:
>
> http://accumulo.apache.org/example/wikisearch.html
>
> "If the cardinality is small enough, it will track the set of documents by term directly."
>
>
> On Sun, Dec 29, 2013 at 8:42 AM, Kepner, Jeremy - 0553 - MITLL <[EMAIL PROTECTED]> wrote:
> Hi Arshak,
>   See interspersed below.
> Regards.  -Jeremy
>
> On Dec 29, 2013, at 11:34 AM, Arshak Navruzyan <[EMAIL PROTECTED]> wrote:
>
>> Jeremy,
>>
>> Thanks for the detailed explanation.  Just a couple of final questions:
>>
>> 1.  What's your advise on the transpose table as far as whether to repeat the indexed term (one per matching row id) or try to store all matching row ids from tedge in a single row in tedgetranspose (using protobuf for example).  What's the performance implication of each approach?  In the paper you mentioned that if it's a few values they should just be stored together.  Was there a cut-off point in your testing?
>
> Can you clarify?  I am not sure what your asking.
>
>>
>> 2.  You mentioned that the degrees should be calculated beforehand for high ingest rates.  Doesn't this change Accumulo from being a true database to being more of an index?  If changes to the data cause the degree table to get out of sync, sounds like changes have to be applied elsewhere first and Accumulo has to be reloaded periodically.  Or perhaps letting the degree table get out of sync is ok since it's just an assist...
>
> My point was a very narrow comment on optimization in very high performance situations. I probably shouldn't have mentioned it.  If you have ever have performance issues with your degree tables, that would be the time to discuss. . You may never encounter this issue.
>
>> Thanks,
>>
>> Arshak
>>
>>
>> On Sat, Dec 28, 2013 at 10:36 AM, Kepner, Jeremy - 0553 - MITLL <[EMAIL PROTECTED]> wrote:
>> Hi Arshak,
>>   Here is how you might do it.  We implement everything with batch writers and batch scanners.  Note: if you are doing high ingest rates, the degree table can be tricky and usually requires pre-summing prior to ingestion to reduce the pressure on the accumulator inside of Accumulo.  Feel free to ask further questions as I would imagine that there a details that still wouldn't be clear.  In particular, why we do it this way.
>>
>> Regards.  -Jeremy
>>
>> Original data:
>>
>> Machine,Pool,Load,ReadingTimestamp
>> neptune,west,5,1388191975000
>> neptune,west,9,1388191975010
>> pluto,east,13,1388191975090
>>
>>
>> Tedge table:
>> rowKey,columnQualifier,value
>>
>> 0005791918831-neptune,Machine|neptune,1
>> 0005791918831-neptune,Pool|west,1
>> 0005791918831-neptune,Load|5,1
>> 0005791918831-neptune,ReadingTimestamp|1388191975000,1
>> 0105791918831-neptune,Machine|neptune,1
>> 0105791918831-neptune,Pool|west,1
>> 0105791918831-neptune,Load|9,1
>> 0105791918831-neptune,ReadingTimestamp|1388191975010,1
>> 0905791918831-pluto,Machine|pluto,1
>> 0905791918831-pluto,Pool|east,1
>> 0905791918831-pluto,Load|13,1
>> 0905791918831-pluto,ReadingTimestamp|1388191975090,1
>>
>>
>> TedgeTranspose table:
>> rowKey,columnQualifier,value
>>
>> Machine|neptune,0005791918831-neptune,1
>> Pool|west,0005791918831-neptune,1
>> Load|5,0005791918831-neptune,1
>> ReadingTimestamp|1388191975000,0005791918831-neptune,1
>> Machine|neptune,0105791918831-neptune,1
>> Pool|west,0105791918831-neptune,1
>> Load|9,0105791918831-neptune,1
>> ReadingTimestamp|1388191975010,0105791918831-neptune,1
>> Machine|pluto,0905791918831-pluto,1
>> Pool|east,0905791918831-pluto,1
>> Load|13,0905791918831-pluto,1
>> ReadingTimestamp|1388191975090,0905791918831-pluto,1
>>
>>
>> TedgeDegree table:
>> rowKey,columnQualifier,value
>>
>> Machine|neptune,Degree,2
+
Arshak Navruzyan 2013-12-29, 20:10
+
Josh Elser 2013-12-29, 20:27
+
Arshak Navruzyan 2013-12-29, 22:45
+
Josh Elser 2013-12-29, 23:51
+
Jeremy Kepner 2013-12-30, 01:23
+
Dylan Hutchison 2013-12-28, 05:53
+
Josh Elser 2013-12-28, 15:52
+
Arshak Navruzyan 2013-12-28, 18:25
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB