Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: storing lists in columns


Copy link to this message
-
Re: storing lists in columns
Hi Stas,

Don't forget that you should always try to keep the number of columns
families lower than 3, else you might face some performances issues.

JM

2013/2/19, Stas Maksimov <[EMAIL PROTECTED]>:
> Hi Jean-Marc,
>
> I've validated this, it works perfectly. Very easy to implement and it's
> very fast!
>
> Thankfully in this project there isn't a lot of lists in each table, so I
> won't have to create too many column families. In other scenarios it could
> be a problem.
>
> Many thanks,
> Stas
>
>
> On 16 February 2013 02:29, Jean-Marc Spaggiari
> <[EMAIL PROTECTED]>wrote:
>
>> Hi Stas,
>>
>> Few options are coming into my mind.
>>
>> Quickly:
>> 1) Why not storing the products in specif columns instead of in the
>> same one? Like:
>> table, rowid1, cf:list, c:aa, value:true
>> table, rowid1, cf:list, c:bb, value:true
>> table, rowid1, cf:list, c:cc, value:true
>> table, rowid2, cf:list, c:aabb, value:true
>> table, rowid2, cf:list, c:cc, value:true
>> That way when you do a search you query directly the right column for
>> the right row. And using "exist" call with also reduce the size of the
>> data transfered.
>>
>> 2) You can store the data in the oposite way. Like:
>> table, aa, cf:products, c:rowid1, value:true
>> table, aabb, cf:products, c:rowid2, value:true
>> table, bb, cf:products, c:rowid1, value:true
>> table, cc, cf:products, c:rowid1, value:true
>> table, cc, cf:products, c:rowid2, value:true
>> Here, you query by your product ID, and you search the column based on
>> your previous rowid.
>>
>>
>> I will say the 2 solutions are equivalent, but it will really depend
>> on your data pattern and you query pattern.
>>
>> JM
>>
>> 2013/2/15, Stas Maksimov <[EMAIL PROTECTED]>:
>> > Hi all,
>> >
>> > I have a requirement to store lists in HBase columns like this:
>> > "table", "rowid1", "f:list", "aa, bb, cc"
>> > "table", "rowid2", "f:list", "aabb, cc"
>> >
>> > There is a further requirement to be able to find rows where f:list
>> > contains a particular item, e.g. when I need to find rows having item
>> "aa"
>> > only "rowid1" should match, and for item "cc" both "rowid1" and
>> > "rowid2"
>> > should match.
>> >
>> > For now I decided to use SingleColumnValueFilter with substring
>> > matching.
>> > As using comma-separated list proved difficult to search through, I'm
>> using
>> > pipe symbols to separate items like this: "|aa|bb|cc|", so that I could
>> > pass the search item surrounded by pipes into the filter:
>> > SingleColumnValueFilter ('f', 'list', =, 'substring:|aa|')
>> >
>> > This proved to work effectively enough, however I would prefer to use
>> > something more standard for my list storage (e.g. serialised JSON), or
>> > perhaps something even more optimised for a search - performance really
>> > does matter here.
>> >
>> > Any opinions on this solution and possible enhancements are much
>> > appreciated.
>> >
>> > Many thanks,
>> > Stas
>> >
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB