|
Konrad Tendera
2012-03-19, 14:51
Manish Bhoge
2012-03-20, 03:44
Jacques
2012-03-20, 06:08
Laxman
2012-03-20, 08:56
Konrad Tendera
2012-03-20, 09:04
Michael Segel
2012-03-20, 09:32
Qian Ye
2012-03-20, 09:32
Konrad Tendera
2012-03-20, 09:40
Michael Segel
2012-03-20, 09:44
|
-
Rows vs. ColumnsKonrad Tendera 2012-03-19, 14:51
Hello,
I'm designing some schema for my use case and I'm considering what will be better: rows or columns. Here's what I need - my schema actually looks like this (it will be used for keeping not large pdf files or single pages of larger document) table files: family "info": "info:pg" - keeps page number "info:id" - sender ID "info:nm" - pdf name *** family "data": "data:blob" - blob of pdf file Now let's get back to ***: each user can add multiple of additional properties ("name" - "value"), but let's assume that every user will be so creative that there won't be two same names. I don't know how solve this problem: each "name" will be new column ("info:name") or I should try to do this like it is said here: http://hbase.apache.org/book.html#schema.smackdown.rowscols and make new row for earch property? K.
-
Re: Rows vs. ColumnsManish Bhoge 2012-03-20, 03:44
Konard,
Do youy really like columnare structure for this kind of problem? I think you can still live with typical row level database. I need to read the link that you have provided. But I am sure this kind of storage type we have used in typical RDBMS. Here one more solution can be possible if your file volume is Big and you need to perform text search on that then you can create row level table in hive and connect through storagehandler to Hbase where exactly you should store your files. Thanks Manish. Sent from my BlackBerry, pls excuse typo -----Original Message----- From: Konrad Tendera <[EMAIL PROTECTED]> Date: Mon, 19 Mar 2012 15:51:35 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Rows vs. Columns Hello, I'm designing some schema for my use case and I'm considering what will be better: rows or columns. Here's what I need - my schema actually looks like this (it will be used for keeping not large pdf files or single pages of larger document) table files: family "info": "info:pg" - keeps page number "info:id" - sender ID "info:nm" - pdf name *** family "data": "data:blob" - blob of pdf file Now let's get back to ***: each user can add multiple of additional properties ("name" - "value"), but let's assume that every user will be so creative that there won't be two same names. I don't know how solve this problem: each "name" will be new column ("info:name") or I should try to do this like it is said here: http://hbase.apache.org/book.html#schema.smackdown.rowscols and make new row for earch property? K.
-
Re: Rows vs. ColumnsJacques 2012-03-20, 06:08
As the advice says... Millions of colums are not a good idea. If your
user information will be sparse eg only a few hundred users will associate with a particular row you'll be fine. However if your matrix is complete you probably need to store as rows. Also you should check out advice (a jira bug covers this) about frequent flushes using column families of substantially different sizes if the blob is large and the info is small. On Mar 19, 2012 1:07 PM, "Konrad Tendera" <[EMAIL PROTECTED]> wrote: > Hello, > > I'm designing some schema for my use case and I'm considering what will be > better: rows or columns. Here's what I need - my schema actually looks like > this (it will be used for keeping not large pdf files or single pages of > larger document) > table files: > family "info": > "info:pg" - keeps page number > "info:id" - sender ID > "info:nm" - pdf name > *** > family "data": > "data:blob" - blob of pdf file > > Now let's get back to ***: each user can add multiple of additional > properties ("name" - "value"), but let's assume that every user will be so > creative that there won't be two same names. I don't know how solve this > problem: each "name" will be new column ("info:name") or I should try to do > this like it is said here: http://hbase.apache.org/book.** > html#schema.smackdown.rowscols<http://hbase.apache.org/book.html#schema.smackdown.rowscols>and make new row for earch property? > > K. >
-
RE: Rows vs. ColumnsLaxman 2012-03-20, 08:56
Do we see any problem with the below schema?
family "info": "info:pg" - keeps page number "info:id" - sender ID "info:nm" - pdf name "info:prop_name" - column to hold property name "info:prop_value" - column to hold property value family "data": "data:blob" - blob of pdf file -- Regards, Laxman > -----Original Message----- > From: Konrad Tendera [mailto:[EMAIL PROTECTED]] > Sent: Monday, March 19, 2012 8:22 PM > To: [EMAIL PROTECTED] > Subject: Rows vs. Columns > > Hello, > > I'm designing some schema for my use case and I'm considering what will > be better: rows or columns. Here's what I need - my schema actually > looks like this (it will be used for keeping not large pdf files or > single pages of larger document) > table files: > family "info": > "info:pg" - keeps page number > "info:id" - sender ID > "info:nm" - pdf name > *** > family "data": > "data:blob" - blob of pdf file > > Now let's get back to ***: each user can add multiple of additional > properties ("name" - "value"), but let's assume that every user will be > so creative that there won't be two same names. I don't know how solve > this problem: each "name" will be new column ("info:name") or I should > try to do this like it is said here: > http://hbase.apache.org/book.html#schema.smackdown.rowscols and make > new > row for earch property? > > K.
-
Re: Rows vs. ColumnsKonrad Tendera 2012-03-20, 09:04
But what about multiple properties? Every user can use any number of properties.
On Tue, 20 Mar 2012 14:26:52 +0530 Laxman <[EMAIL PROTECTED]> wrote: > Do we see any problem with the below schema? > > family "info": > "info:pg" - keeps page number > "info:id" - sender ID > "info:nm" - pdf name > "info:prop_name" - column to hold property name > "info:prop_value" - column to hold property value > family "data": > "data:blob" - blob of pdf file > > -- > Regards, > Laxman > > -----Original Message----- > > From: Konrad Tendera [mailto:[EMAIL PROTECTED]] > > Sent: Monday, March 19, 2012 8:22 PM > > To: [EMAIL PROTECTED] > > Subject: Rows vs. Columns > > > > Hello, > > > > I'm designing some schema for my use case and I'm considering what will > > be better: rows or columns. Here's what I need - my schema actually > > looks like this (it will be used for keeping not large pdf files or > > single pages of larger document) > > table files: > > family "info": > > "info:pg" - keeps page number > > "info:id" - sender ID > > "info:nm" - pdf name > > *** > > family "data": > > "data:blob" - blob of pdf file > > > > Now let's get back to ***: each user can add multiple of additional > > properties ("name" - "value"), but let's assume that every user will be > > so creative that there won't be two same names. I don't know how solve > > this problem: each "name" will be new column ("info:name") or I should > > try to do this like it is said here: > > http://hbase.apache.org/book.html#schema.smackdown.rowscols and make > > new > > row for earch property? > > > > K. > -- Konrad Tendera
-
Re: Rows vs. ColumnsMichael Segel 2012-03-20, 09:32
Yes,
Currently if one of the column family causes a split, then all of the column families get split. So if you are dealing with a large blob, you're going to shoot yourself in the foot. Are you filtering on any of the values in the 'info' family? If not, you could try creating a serialized record. (AVRO is an example) for the info data, and then store the data in a single column family where one column contains the info rec and the other column contains the blob. Or you could use two tables with the same row key. But that would mean two get()s... having said that if you were doing a table scan, you'd want to scan the info column and based on the results, you would fetch back the blob. HTH -Mike On Mar 20, 2012, at 3:56 AM, Laxman wrote: > Do we see any problem with the below schema? > > family "info": > "info:pg" - keeps page number > "info:id" - sender ID > "info:nm" - pdf name > "info:prop_name" - column to hold property name > "info:prop_value" - column to hold property value > family "data": > "data:blob" - blob of pdf file > > -- > Regards, > Laxman >> -----Original Message----- >> From: Konrad Tendera [mailto:[EMAIL PROTECTED]] >> Sent: Monday, March 19, 2012 8:22 PM >> To: [EMAIL PROTECTED] >> Subject: Rows vs. Columns >> >> Hello, >> >> I'm designing some schema for my use case and I'm considering what will >> be better: rows or columns. Here's what I need - my schema actually >> looks like this (it will be used for keeping not large pdf files or >> single pages of larger document) >> table files: >> family "info": >> "info:pg" - keeps page number >> "info:id" - sender ID >> "info:nm" - pdf name >> *** >> family "data": >> "data:blob" - blob of pdf file >> >> Now let's get back to ***: each user can add multiple of additional >> properties ("name" - "value"), but let's assume that every user will be >> so creative that there won't be two same names. I don't know how solve >> this problem: each "name" will be new column ("info:name") or I should >> try to do this like it is said here: >> http://hbase.apache.org/book.html#schema.smackdown.rowscols and make >> new >> row for earch property? >> >> K. > >
-
Re: Rows vs. ColumnsQian Ye 2012-03-20, 09:32
I think the average number of properties users would add to a specific page
should be estimated. I guess, about 99.9% pages would not be associated with too many properties. The others can be handled with special solution. Saving properties as columns is a good way to this problem, I think. On Tue, Mar 20, 2012 at 5:04 PM, Konrad Tendera <[EMAIL PROTECTED]> wrote: > But what about multiple properties? Every user can use any number of > properties. > > On Tue, 20 Mar 2012 14:26:52 +0530 > Laxman <[EMAIL PROTECTED]> wrote: > > > Do we see any problem with the below schema? > > > > family "info": > > "info:pg" - keeps page number > > "info:id" - sender ID > > "info:nm" - pdf name > > "info:prop_name" - column to hold property name > > "info:prop_value" - column to hold property value > > family "data": > > "data:blob" - blob of pdf file > > > > -- > > Regards, > > Laxman > > > -----Original Message----- > > > From: Konrad Tendera [mailto:[EMAIL PROTECTED]] > > > Sent: Monday, March 19, 2012 8:22 PM > > > To: [EMAIL PROTECTED] > > > Subject: Rows vs. Columns > > > > > > Hello, > > > > > > I'm designing some schema for my use case and I'm considering what will > > > be better: rows or columns. Here's what I need - my schema actually > > > looks like this (it will be used for keeping not large pdf files or > > > single pages of larger document) > > > table files: > > > family "info": > > > "info:pg" - keeps page number > > > "info:id" - sender ID > > > "info:nm" - pdf name > > > *** > > > family "data": > > > "data:blob" - blob of pdf file > > > > > > Now let's get back to ***: each user can add multiple of additional > > > properties ("name" - "value"), but let's assume that every user will be > > > so creative that there won't be two same names. I don't know how solve > > > this problem: each "name" will be new column ("info:name") or I should > > > try to do this like it is said here: > > > http://hbase.apache.org/book.html#schema.smackdown.rowscols and make > > > new > > > row for earch property? > > > > > > K. > > > > > -- > Konrad Tendera > -- With Regards! Ye, Qian
-
Re: Rows vs. ColumnsKonrad Tendera 2012-03-20, 09:40
I think that two separate tables can work, because users usually fetch file info and the blob of specific file is fetched rarely.
On Tue, 20 Mar 2012 04:32:48 -0500 Michael Segel <[EMAIL PROTECTED]> wrote: > Yes, > > Currently if one of the column family causes a split, then all of the column families get split. So if you are dealing with a large blob, you're going to shoot yourself in the foot. > > Are you filtering on any of the values in the 'info' family? > If not, you could try creating a serialized record. (AVRO is an example) for the info data, > and then store the data in a single column family where one column contains the info rec and the other column contains the blob. > > Or you could use two tables with the same row key. But that would mean two get()s... having said that if you were doing a table scan, you'd want to scan the info column and based on the results, you would fetch back the blob. > > HTH > > -Mike > > On Mar 20, 2012, at 3:56 AM, Laxman wrote: > > > Do we see any problem with the below schema? > > > > family "info": > > "info:pg" - keeps page number > > "info:id" - sender ID > > "info:nm" - pdf name > > "info:prop_name" - column to hold property name > > "info:prop_value" - column to hold property value > > family "data": > > "data:blob" - blob of pdf file > > > > -- > > Regards, > > Laxman > >> -----Original Message----- > >> From: Konrad Tendera [mailto:[EMAIL PROTECTED]] > >> Sent: Monday, March 19, 2012 8:22 PM > >> To: [EMAIL PROTECTED] > >> Subject: Rows vs. Columns > >> > >> Hello, > >> > >> I'm designing some schema for my use case and I'm considering what will > >> be better: rows or columns. Here's what I need - my schema actually > >> looks like this (it will be used for keeping not large pdf files or > >> single pages of larger document) > >> table files: > >> family "info": > >> "info:pg" - keeps page number > >> "info:id" - sender ID > >> "info:nm" - pdf name > >> *** > >> family "data": > >> "data:blob" - blob of pdf file > >> > >> Now let's get back to ***: each user can add multiple of additional > >> properties ("name" - "value"), but let's assume that every user will be > >> so creative that there won't be two same names. I don't know how solve > >> this problem: each "name" will be new column ("info:name") or I should > >> try to do this like it is said here: > >> http://hbase.apache.org/book.html#schema.smackdown.rowscols and make > >> new > >> row for earch property? > >> > >> K. > > > > > -- Konrad Tendera
-
Re: Rows vs. ColumnsMichael Segel 2012-03-20, 09:44
Why not make your properties a map object?
On Mar 20, 2012, at 4:32 AM, Qian Ye wrote: > I think the average number of properties users would add to a specific page > should be estimated. I guess, about 99.9% pages would not be associated > with too many properties. The others can be handled with special solution. > Saving properties as columns is a good way to this problem, I think. > > On Tue, Mar 20, 2012 at 5:04 PM, Konrad Tendera <[EMAIL PROTECTED]> wrote: > >> But what about multiple properties? Every user can use any number of >> properties. >> >> On Tue, 20 Mar 2012 14:26:52 +0530 >> Laxman <[EMAIL PROTECTED]> wrote: >> >>> Do we see any problem with the below schema? >>> >>> family "info": >>> "info:pg" - keeps page number >>> "info:id" - sender ID >>> "info:nm" - pdf name >>> "info:prop_name" - column to hold property name >>> "info:prop_value" - column to hold property value >>> family "data": >>> "data:blob" - blob of pdf file >>> >>> -- >>> Regards, >>> Laxman >>>> -----Original Message----- >>>> From: Konrad Tendera [mailto:[EMAIL PROTECTED]] >>>> Sent: Monday, March 19, 2012 8:22 PM >>>> To: [EMAIL PROTECTED] >>>> Subject: Rows vs. Columns >>>> >>>> Hello, >>>> >>>> I'm designing some schema for my use case and I'm considering what will >>>> be better: rows or columns. Here's what I need - my schema actually >>>> looks like this (it will be used for keeping not large pdf files or >>>> single pages of larger document) >>>> table files: >>>> family "info": >>>> "info:pg" - keeps page number >>>> "info:id" - sender ID >>>> "info:nm" - pdf name >>>> *** >>>> family "data": >>>> "data:blob" - blob of pdf file >>>> >>>> Now let's get back to ***: each user can add multiple of additional >>>> properties ("name" - "value"), but let's assume that every user will be >>>> so creative that there won't be two same names. I don't know how solve >>>> this problem: each "name" will be new column ("info:name") or I should >>>> try to do this like it is said here: >>>> http://hbase.apache.org/book.html#schema.smackdown.rowscols and make >>>> new >>>> row for earch property? >>>> >>>> K. >>> >> >> >> -- >> Konrad Tendera >> > > > > -- > With Regards! > > Ye, Qian |