|
|
-
RowKey design with hashing
Nurettin Şimşek 2013-02-13, 08:35
Hi All,
In our project mail adresses are row key. Which rowkey design we should choose?
1) com.yahoo@xxxx (Reversed) 2) [EMAIL PROTECTED] 3) md5 hash([EMAIL PROTECTED]) 4) Any other solution.
Many thanks.
-- M. Nurettin ŞİMŞEK
+
Nurettin Şimşek 2013-02-13, 08:35
-
Re: RowKey design with hashing
lars hofhansl 2013-02-14, 00:50
Depends on you search pattern. If you never care about scans ordering i.e. you only do point gets to see whether you've already seen an email address, do the hash part.
I'd perfer #1 over #2, because it would let you do efficient key prefix block encoding (FAST_DIFF).
-- Lars
________________________________ From: Nurettin Şimşek <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wednesday, February 13, 2013 12:35 AM Subject: RowKey design with hashing Hi All,
In our project mail adresses are row key. Which rowkey design we should choose?
1) com.yahoo@xxxx (Reversed) 2) [EMAIL PROTECTED] 3) md5 hash([EMAIL PROTECTED]) 4) Any other solution.
Many thanks.
-- M. Nurettin ŞİMŞEK
+
lars hofhansl 2013-02-14, 00:50
-
Re: RowKey design with hashing
Jean-Marc Spaggiari 2013-02-14, 02:09
Hi Lars,
Can you please tell more about key prefix block encoding? Or refer to some blog/doc? How it works, what it is, etc.?
Thanks,
JM
2013/2/13, lars hofhansl <[EMAIL PROTECTED]>: > Depends on you search pattern. > If you never care about scans ordering i.e. you only do point gets to see > whether you've already seen an email address, do the hash part. > > I'd perfer #1 over #2, because it would let you do efficient key prefix > block encoding (FAST_DIFF). > > -- Lars > > > > ________________________________ > From: Nurettin Şimşek <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Wednesday, February 13, 2013 12:35 AM > Subject: RowKey design with hashing > > Hi All, > > In our project mail adresses are row key. Which rowkey design we should > choose? > > 1) com.yahoo@xxxx (Reversed) > 2) [EMAIL PROTECTED] > 3) md5 hash([EMAIL PROTECTED]) > 4) Any other solution. > > Many thanks. > > -- > M. Nurettin ŞİMŞEK
+
Jean-Marc Spaggiari 2013-02-14, 02:09
-
Re: RowKey design with hashing
Ted Yu 2013-02-14, 03:18
Jean-Marc: You can find almost all the details you need from this JIRA: HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
Cheers
On Wed, Feb 13, 2013 at 6:09 PM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote:
> Hi Lars, > > Can you please tell more about key prefix block encoding? Or refer to > some blog/doc? How it works, what it is, etc.? > > Thanks, > > JM > > 2013/2/13, lars hofhansl <[EMAIL PROTECTED]>: > > Depends on you search pattern. > > If you never care about scans ordering i.e. you only do point gets to see > > whether you've already seen an email address, do the hash part. > > > > I'd perfer #1 over #2, because it would let you do efficient key prefix > > block encoding (FAST_DIFF). > > > > -- Lars > > > > > > > > ________________________________ > > From: Nurettin Şimşek <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Sent: Wednesday, February 13, 2013 12:35 AM > > Subject: RowKey design with hashing > > > > Hi All, > > > > In our project mail adresses are row key. Which rowkey design we should > > choose? > > > > 1) com.yahoo@xxxx (Reversed) > > 2) [EMAIL PROTECTED] > > 3) md5 hash([EMAIL PROTECTED]) > > 4) Any other solution. > > > > Many thanks. > > > > -- > > M. Nurettin ŞİMŞEK >
+
Ted Yu 2013-02-14, 03:18
-
Re: RowKey design with hashing
Mehmet Simsek 2013-02-14, 03:41
Thanks Lars
M.Nurettin Şimşek
On 14 Şub 2013, at 05:18, Ted Yu <[EMAIL PROTECTED]> wrote:
> Jean-Marc: > You can find almost all the details you need from this JIRA: > HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding / prefix > compression) > > Cheers > > On Wed, Feb 13, 2013 at 6:09 PM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > >> Hi Lars, >> >> Can you please tell more about key prefix block encoding? Or refer to >> some blog/doc? How it works, what it is, etc.? >> >> Thanks, >> >> JM >> >> 2013/2/13, lars hofhansl <[EMAIL PROTECTED]>: >>> Depends on you search pattern. >>> If you never care about scans ordering i.e. you only do point gets to see >>> whether you've already seen an email address, do the hash part. >>> >>> I'd perfer #1 over #2, because it would let you do efficient key prefix >>> block encoding (FAST_DIFF). >>> >>> -- Lars >>> >>> >>> >>> ________________________________ >>> From: Nurettin Şimşek <[EMAIL PROTECTED]> >>> To: [EMAIL PROTECTED] >>> Sent: Wednesday, February 13, 2013 12:35 AM >>> Subject: RowKey design with hashing >>> >>> Hi All, >>> >>> In our project mail adresses are row key. Which rowkey design we should >>> choose? >>> >>> 1) com.yahoo@xxxx (Reversed) >>> 2) [EMAIL PROTECTED] >>> 3) md5 hash([EMAIL PROTECTED]) >>> 4) Any other solution. >>> >>> Many thanks. >>> >>> -- >>> M. Nurettin ŞİMŞEK >>
+
Mehmet Simsek 2013-02-14, 03:41
-
Re: RowKey design with hashing
Ted Yu 2013-02-14, 03:58
My name is Ted, not Lars :-)
On Wed, Feb 13, 2013 at 7:41 PM, Mehmet Simsek <[EMAIL PROTECTED]>wrote:
> Thanks Lars > > M.Nurettin Şimşek > > On 14 Şub 2013, at 05:18, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Jean-Marc: > > You can find almost all the details you need from this JIRA: > > HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding / prefix > > compression) > > > > Cheers > > > > On Wed, Feb 13, 2013 at 6:09 PM, Jean-Marc Spaggiari < > > [EMAIL PROTECTED]> wrote: > > > >> Hi Lars, > >> > >> Can you please tell more about key prefix block encoding? Or refer to > >> some blog/doc? How it works, what it is, etc.? > >> > >> Thanks, > >> > >> JM > >> > >> 2013/2/13, lars hofhansl <[EMAIL PROTECTED]>: > >>> Depends on you search pattern. > >>> If you never care about scans ordering i.e. you only do point gets to > see > >>> whether you've already seen an email address, do the hash part. > >>> > >>> I'd perfer #1 over #2, because it would let you do efficient key prefix > >>> block encoding (FAST_DIFF). > >>> > >>> -- Lars > >>> > >>> > >>> > >>> ________________________________ > >>> From: Nurettin Şimşek <[EMAIL PROTECTED]> > >>> To: [EMAIL PROTECTED] > >>> Sent: Wednesday, February 13, 2013 12:35 AM > >>> Subject: RowKey design with hashing > >>> > >>> Hi All, > >>> > >>> In our project mail adresses are row key. Which rowkey design we > should > >>> choose? > >>> > >>> 1) com.yahoo@xxxx (Reversed) > >>> 2) [EMAIL PROTECTED] > >>> 3) md5 hash([EMAIL PROTECTED]) > >>> 4) Any other solution. > >>> > >>> Many thanks. > >>> > >>> -- > >>> M. Nurettin ŞİMŞEK > >> >
+
Ted Yu 2013-02-14, 03:58
-
Re: RowKey design with hashing
Jean-Marc Spaggiari 2013-02-25, 02:25
Hi Ted,
Thanks for pointing me to HBASE-4218. I will take a look at it.
JM
2013/2/13 Ted Yu <[EMAIL PROTECTED]>
> My name is Ted, not Lars :-) > > On Wed, Feb 13, 2013 at 7:41 PM, Mehmet Simsek <[EMAIL PROTECTED] > >wrote: > > > Thanks Lars > > > > M.Nurettin Şimşek > > > > On 14 Şub 2013, at 05:18, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > Jean-Marc: > > > You can find almost all the details you need from this JIRA: > > > HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding / > prefix > > > compression) > > > > > > Cheers > > > > > > On Wed, Feb 13, 2013 at 6:09 PM, Jean-Marc Spaggiari < > > > [EMAIL PROTECTED]> wrote: > > > > > >> Hi Lars, > > >> > > >> Can you please tell more about key prefix block encoding? Or refer to > > >> some blog/doc? How it works, what it is, etc.? > > >> > > >> Thanks, > > >> > > >> JM > > >> > > >> 2013/2/13, lars hofhansl <[EMAIL PROTECTED]>: > > >>> Depends on you search pattern. > > >>> If you never care about scans ordering i.e. you only do point gets to > > see > > >>> whether you've already seen an email address, do the hash part. > > >>> > > >>> I'd perfer #1 over #2, because it would let you do efficient key > prefix > > >>> block encoding (FAST_DIFF). > > >>> > > >>> -- Lars > > >>> > > >>> > > >>> > > >>> ________________________________ > > >>> From: Nurettin Şimşek <[EMAIL PROTECTED]> > > >>> To: [EMAIL PROTECTED] > > >>> Sent: Wednesday, February 13, 2013 12:35 AM > > >>> Subject: RowKey design with hashing > > >>> > > >>> Hi All, > > >>> > > >>> In our project mail adresses are row key. Which rowkey design we > > should > > >>> choose? > > >>> > > >>> 1) com.yahoo@xxxx (Reversed) > > >>> 2) [EMAIL PROTECTED] > > >>> 3) md5 hash([EMAIL PROTECTED]) > > >>> 4) Any other solution. > > >>> > > >>> Many thanks. > > >>> > > >>> -- > > >>> M. Nurettin ŞİMŞEK > > >> > > >
+
Jean-Marc Spaggiari 2013-02-25, 02:25
-
Re: RowKey design with hashing
Alexander Ignatov 2013-02-13, 08:40
If you have only one domain 'yahoo.com' for all mail addresses you probably can use row keys as 'xxxx' without adding '@yahoo.com'.
-- Regards, Alexander Ignatov On 2/13/2013 12:35 PM, Nurettin Şimşek wrote: > Hi All, > > In our project mail adresses are row key. Which rowkey design we should > choose? > > 1) com.yahoo@xxxx (Reversed) > 2) [EMAIL PROTECTED] > 3) md5 hash([EMAIL PROTECTED]) > 4) Any other solution. > > Many thanks. >
+
Alexander Ignatov 2013-02-13, 08:40
-
Re: RowKey design with hashing
Amit Sela 2013-02-13, 09:01
If you have a good distribution of domains then use the reversed domain key, it will allow you to scan over domains faster.
On Wed, Feb 13, 2013 at 10:40 AM, Alexander Ignatov <[EMAIL PROTECTED]>wrote:
> If you have only one domain 'yahoo.com' for all mail addresses you > probably can use row keys as 'xxxx' without adding '@yahoo.com'. > > -- > Regards, > Alexander Ignatov > > > > On 2/13/2013 12:35 PM, Nurettin Şimşek wrote: > >> Hi All, >> >> In our project mail adresses are row key. Which rowkey design we should >> choose? >> >> 1) com.yahoo@xxxx (Reversed) >> 2) [EMAIL PROTECTED] >> 3) md5 hash([EMAIL PROTECTED]) >> 4) Any other solution. >> >> Many thanks. >> >> >
+
Amit Sela 2013-02-13, 09:01
-
Re: RowKey design with hashing
Nurettin Şimşek 2013-02-13, 09:42
I want to search email adress equality. There are many many domains not only yahoo.
What is disadvantages of using hashing?
+
Nurettin Şimşek 2013-02-13, 09:42
-
Re: RowKey design with hashing
Jean-Marc Spaggiari 2013-02-13, 12:06
I don't see any issue with #2 and it might be the simplest one. But all will depend on your read pattern. If you need to scan by domain, 1 is better. I you need to list the emails without knowing it, 2 might be better. If you only access it given a specific address, 3 can be good.
So I will say, all depend on what you want to do with it...
2013/2/13, Nurettin Şimşek <[EMAIL PROTECTED]>: > I want to search email adress equality. There are many many domains not > only yahoo. > > What is disadvantages of using hashing? >
+
Jean-Marc Spaggiari 2013-02-13, 12:06
-
Re: RowKey design with hashing
Nurettin Şimşek 2013-02-13, 20:03
Thanks Jean,
3 can be good for us.
+
Nurettin Şimşek 2013-02-13, 20:03
|
|