Stas Maksimov 2013-02-19, 11:07
-Re: storing lists in columns
Jean-Marc Spaggiari 2013-02-19, 12:23
Don't forget that you should always try to keep the number of columns
families lower than 3, else you might face some performances issues.
2013/2/19, Stas Maksimov <[EMAIL PROTECTED]>:
> Hi Jean-Marc,
> I've validated this, it works perfectly. Very easy to implement and it's
> very fast!
> Thankfully in this project there isn't a lot of lists in each table, so I
> won't have to create too many column families. In other scenarios it could
> be a problem.
> Many thanks,
> On 16 February 2013 02:29, Jean-Marc Spaggiari
> <[EMAIL PROTECTED]>wrote:
>> Hi Stas,
>> Few options are coming into my mind.
>> 1) Why not storing the products in specif columns instead of in the
>> same one? Like:
>> table, rowid1, cf:list, c:aa, value:true
>> table, rowid1, cf:list, c:bb, value:true
>> table, rowid1, cf:list, c:cc, value:true
>> table, rowid2, cf:list, c:aabb, value:true
>> table, rowid2, cf:list, c:cc, value:true
>> That way when you do a search you query directly the right column for
>> the right row. And using "exist" call with also reduce the size of the
>> data transfered.
>> 2) You can store the data in the oposite way. Like:
>> table, aa, cf:products, c:rowid1, value:true
>> table, aabb, cf:products, c:rowid2, value:true
>> table, bb, cf:products, c:rowid1, value:true
>> table, cc, cf:products, c:rowid1, value:true
>> table, cc, cf:products, c:rowid2, value:true
>> Here, you query by your product ID, and you search the column based on
>> your previous rowid.
>> I will say the 2 solutions are equivalent, but it will really depend
>> on your data pattern and you query pattern.
>> 2013/2/15, Stas Maksimov <[EMAIL PROTECTED]>:
>> > Hi all,
>> > I have a requirement to store lists in HBase columns like this:
>> > "table", "rowid1", "f:list", "aa, bb, cc"
>> > "table", "rowid2", "f:list", "aabb, cc"
>> > There is a further requirement to be able to find rows where f:list
>> > contains a particular item, e.g. when I need to find rows having item
>> > only "rowid1" should match, and for item "cc" both "rowid1" and
>> > "rowid2"
>> > should match.
>> > For now I decided to use SingleColumnValueFilter with substring
>> > matching.
>> > As using comma-separated list proved difficult to search through, I'm
>> > pipe symbols to separate items like this: "|aa|bb|cc|", so that I could
>> > pass the search item surrounded by pipes into the filter:
>> > SingleColumnValueFilter ('f', 'list', =, 'substring:|aa|')
>> > This proved to work effectively enough, however I would prefer to use
>> > something more standard for my list storage (e.g. serialised JSON), or
>> > perhaps something even more optimised for a search - performance really
>> > does matter here.
>> > Any opinions on this solution and possible enhancements are much
>> > appreciated.
>> > Many thanks,
>> > Stas