Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase - Column family


Copy link to this message
-
Re: HBase - Column family
If you only want some of the columns, you could return a subset by
using server side Filters.

Your schema can be designed in multiple ways - it all depends on what
your access patterns are.
Here's a good thread on various schema design alternatives for
one-to-many relationships. There are many other such threads that you
can search the mailing lists for.
http://search-hadoop.com/m/Yj4TE1g3ZX51

--Suraj

2011/4/23 Panayotis Antonopoulos <[EMAIL PROTECTED]>:
>
> I am also a beginner, so I would like to ask you something about the method you proposed.
> HBase is column-oriented. This means (as far as I know from databases) that it stores its data column by column and not row by row.
> If we use the schema you suggested then when we want some of the documents for a single word we will have to access many columns and I think this will cost as a lot.
> I think that the locality of the data is lost using this schema.
>
> I repeat that I am a beginner so please correct me if I am wrong.
>
> Regards,
> Panagiotis.
>
>> Date: Sat, 23 Apr 2011 11:25:47 +0200
>> Subject: Re: HBase - Column family
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>>
>> That's how I would do it:
>> What's nice in HBase is that you can store all the data for one of
>> your keywords in a single row.
>> Create a column family "doc_id".
>> Now, for each word, you create one row.
>> In this row, for each matching document you create one column (that's
>> the gotcha compared to a RDB design).
>> The name of the column is the doc id. The column's cell content is the weight.
>>
>> So, following your example you'd get:
>>
>> row id | column-family:column....
>> HELLO |  doc_id:2 | doc_id:3 | doc_id:4
>>
>> and column values:
>> doc_id:2 | doc_id:3 | doc_id:4
>> 12 | 45 | 36
>>
>> HTH,
>>
>>   Bernd
>>
>>
>> On Sat, Apr 23, 2011 at 09:56, JohnJohnGa <[EMAIL PROTECTED]> wrote:
>> > Hi, I'm a beginner in HBase. I need to design my table. I want to play with the
>> > following information:
>> >
>> > At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the weight of
>> > each doc is 12,45,36 - My raw data: doc:D title:'i like potatoes',weight:W,date:D
>> >
>> > I created a table with, row: word, column:date, value:doc But I can't store
>> > multiple row with the same date, for the same word.
>> >
>> > Can we create multiple column families for a table? What can be the best way to
>> > design the schema?
>> >
>> > Thanks a lot
>> >
>> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB