Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> schema examples


+
Arshak Navruzyan 2013-12-26, 20:10
+
Jeremy Kepner 2013-12-27, 01:33
+
Jeremy Kepner 2013-12-27, 01:31
+
Arshak Navruzyan 2013-12-28, 01:01
+
Kepner, Jeremy - 0553 - M... 2013-12-28, 18:36
+
Arshak Navruzyan 2013-12-29, 16:34
Hi Arshak,
  See interspersed below.
Regards.  -Jeremy

On Dec 29, 2013, at 11:34 AM, Arshak Navruzyan <[EMAIL PROTECTED]> wrote:

> Jeremy,
>
> Thanks for the detailed explanation.  Just a couple of final questions:
>
> 1.  What's your advise on the transpose table as far as whether to repeat the indexed term (one per matching row id) or try to store all matching row ids from tedge in a single row in tedgetranspose (using protobuf for example).  What's the performance implication of each approach?  In the paper you mentioned that if it's a few values they should just be stored together.  Was there a cut-off point in your testing?

Can you clarify?  I am not sure what your asking.

>
> 2.  You mentioned that the degrees should be calculated beforehand for high ingest rates.  Doesn't this change Accumulo from being a true database to being more of an index?  If changes to the data cause the degree table to get out of sync, sounds like changes have to be applied elsewhere first and Accumulo has to be reloaded periodically.  Or perhaps letting the degree table get out of sync is ok since it's just an assist...

My point was a very narrow comment on optimization in very high performance situations. I probably shouldn't have mentioned it.  If you have ever have performance issues with your degree tables, that would be the time to discuss. . You may never encounter this issue.

> Thanks,
>
> Arshak
>
>
> On Sat, Dec 28, 2013 at 10:36 AM, Kepner, Jeremy - 0553 - MITLL <[EMAIL PROTECTED]> wrote:
> Hi Arshak,
>   Here is how you might do it.  We implement everything with batch writers and batch scanners.  Note: if you are doing high ingest rates, the degree table can be tricky and usually requires pre-summing prior to ingestion to reduce the pressure on the accumulator inside of Accumulo.  Feel free to ask further questions as I would imagine that there a details that still wouldn't be clear.  In particular, why we do it this way.
>
> Regards.  -Jeremy
>
> Original data:
>
> Machine,Pool,Load,ReadingTimestamp
> neptune,west,5,1388191975000
> neptune,west,9,1388191975010
> pluto,east,13,1388191975090
>
>
> Tedge table:
> rowKey,columnQualifier,value
>
> 0005791918831-neptune,Machine|neptune,1
> 0005791918831-neptune,Pool|west,1
> 0005791918831-neptune,Load|5,1
> 0005791918831-neptune,ReadingTimestamp|1388191975000,1
> 0105791918831-neptune,Machine|neptune,1
> 0105791918831-neptune,Pool|west,1
> 0105791918831-neptune,Load|9,1
> 0105791918831-neptune,ReadingTimestamp|1388191975010,1
> 0905791918831-pluto,Machine|pluto,1
> 0905791918831-pluto,Pool|east,1
> 0905791918831-pluto,Load|13,1
> 0905791918831-pluto,ReadingTimestamp|1388191975090,1
>
>
> TedgeTranspose table:
> rowKey,columnQualifier,value
>
> Machine|neptune,0005791918831-neptune,1
> Pool|west,0005791918831-neptune,1
> Load|5,0005791918831-neptune,1
> ReadingTimestamp|1388191975000,0005791918831-neptune,1
> Machine|neptune,0105791918831-neptune,1
> Pool|west,0105791918831-neptune,1
> Load|9,0105791918831-neptune,1
> ReadingTimestamp|1388191975010,0105791918831-neptune,1
> Machine|pluto,0905791918831-pluto,1
> Pool|east,0905791918831-pluto,1
> Load|13,0905791918831-pluto,1
> ReadingTimestamp|1388191975090,0905791918831-pluto,1
>
>
> TedgeDegree table:
> rowKey,columnQualifier,value
>
> Machine|neptune,Degree,2
> Pool|west,Degree,2
> Load|5,Degree,1
> ReadingTimestamp|1388191975000,Degree,1
> Load|9,Degree,1
> ReadingTimestamp|1388191975010,Degree,1
> Machine|pluto,Degree,1
> Pool|east,Degree,1
> Load|13,Degree,1
> ReadingTimestamp|1388191975090,Degree,1
>
>
> TedgeText table:
> rowKey,columnQualifier,value
>
> 0005791918831-neptune,Text,< ... raw text of original log ...>
> 0105791918831-neptune,Text,< ... raw text of original log ...>
> 0905791918831-pluto,Text,< ... raw text of original log ...>
>
> On Dec 27, 2013, at 8:01 PM, Arshak Navruzyan <[EMAIL PROTECTED]> wrote:
>
> > Jeremy,
> >
> > Wow, didn't expect to get help from the author :)
> >
> > How about something simple like this:
+
Arshak Navruzyan 2013-12-29, 16:57
+
Kepner, Jeremy - 0553 - M... 2013-12-29, 17:12
+
Arshak Navruzyan 2013-12-29, 20:10
+
Josh Elser 2013-12-29, 20:27
+
Arshak Navruzyan 2013-12-29, 22:45
+
Josh Elser 2013-12-29, 23:51
+
Jeremy Kepner 2013-12-30, 01:23
+
Dylan Hutchison 2013-12-28, 05:53
+
Josh Elser 2013-12-28, 15:52
+
Arshak Navruzyan 2013-12-28, 18:25