Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: storing lists in columns


+
Stas Maksimov 2013-02-19, 11:07
Copy link to this message
-
Re: storing lists in columns
Jean-Marc Spaggiari 2013-02-19, 12:23
Hi Stas,

Don't forget that you should always try to keep the number of columns
families lower than 3, else you might face some performances issues.

JM

2013/2/19, Stas Maksimov <[EMAIL PROTECTED]>:
> Hi Jean-Marc,
>
> I've validated this, it works perfectly. Very easy to implement and it's
> very fast!
>
> Thankfully in this project there isn't a lot of lists in each table, so I
> won't have to create too many column families. In other scenarios it could
> be a problem.
>
> Many thanks,
> Stas
>
>
> On 16 February 2013 02:29, Jean-Marc Spaggiari
> <[EMAIL PROTECTED]>wrote:
>
>> Hi Stas,
>>
>> Few options are coming into my mind.
>>
>> Quickly:
>> 1) Why not storing the products in specif columns instead of in the
>> same one? Like:
>> table, rowid1, cf:list, c:aa, value:true
>> table, rowid1, cf:list, c:bb, value:true
>> table, rowid1, cf:list, c:cc, value:true
>> table, rowid2, cf:list, c:aabb, value:true
>> table, rowid2, cf:list, c:cc, value:true
>> That way when you do a search you query directly the right column for
>> the right row. And using "exist" call with also reduce the size of the
>> data transfered.
>>
>> 2) You can store the data in the oposite way. Like:
>> table, aa, cf:products, c:rowid1, value:true
>> table, aabb, cf:products, c:rowid2, value:true
>> table, bb, cf:products, c:rowid1, value:true
>> table, cc, cf:products, c:rowid1, value:true
>> table, cc, cf:products, c:rowid2, value:true
>> Here, you query by your product ID, and you search the column based on
>> your previous rowid.
>>
>>
>> I will say the 2 solutions are equivalent, but it will really depend
>> on your data pattern and you query pattern.
>>
>> JM
>>
>> 2013/2/15, Stas Maksimov <[EMAIL PROTECTED]>:
>> > Hi all,
>> >
>> > I have a requirement to store lists in HBase columns like this:
>> > "table", "rowid1", "f:list", "aa, bb, cc"
>> > "table", "rowid2", "f:list", "aabb, cc"
>> >
>> > There is a further requirement to be able to find rows where f:list
>> > contains a particular item, e.g. when I need to find rows having item
>> "aa"
>> > only "rowid1" should match, and for item "cc" both "rowid1" and
>> > "rowid2"
>> > should match.
>> >
>> > For now I decided to use SingleColumnValueFilter with substring
>> > matching.
>> > As using comma-separated list proved difficult to search through, I'm
>> using
>> > pipe symbols to separate items like this: "|aa|bb|cc|", so that I could
>> > pass the search item surrounded by pipes into the filter:
>> > SingleColumnValueFilter ('f', 'list', =, 'substring:|aa|')
>> >
>> > This proved to work effectively enough, however I would prefer to use
>> > something more standard for my list storage (e.g. serialised JSON), or
>> > perhaps something even more optimised for a search - performance really
>> > does matter here.
>> >
>> > Any opinions on this solution and possible enhancements are much
>> > appreciated.
>> >
>> > Many thanks,
>> > Stas
>> >
>>
>