|
yonghu
2011-11-26, 06:14
Jahangir Mohammed
2011-11-26, 07:11
yonghu
2011-11-26, 07:47
Doug Meil
2011-11-26, 15:33
Jahangir Mohammed
2011-11-26, 17:02
lars hofhansl
2011-11-27, 19:32
yonghu
2011-11-27, 20:34
lars hofhansl
2011-11-27, 23:31
Doug Meil
2011-11-28, 16:08
lars hofhansl
2011-11-28, 19:03
|
-
How HBase implements delete operationsyonghu 2011-11-26, 06:14
hello,
I read http://hbase.apache.org/book/versions.html and have a question about delete operation. As it mentions, the user can delete a whole row or delete a data version of cell. The delete operation of data version of cell is just to write a tombstone marker for that version. I want to know how about delete a row? Does HBase deletes the row immediately? or use the same strategy as deleting a data version which create a tombstone for that row key? Or create a tombstone for every data version belongs to that row? regards Yong
-
Re: How HBase implements delete operationsJahangir Mohammed 2011-11-26, 07:11
Tombstone. Same as cell.
Thanks, Jahangir Mohammed. On Sat, Nov 26, 2011 at 1:14 AM, yonghu <[EMAIL PROTECTED]> wrote: > hello, > > I read http://hbase.apache.org/book/versions.html and have a question > about > delete operation. As it mentions, the user can delete a whole row or delete > a data version of cell. The delete operation of data version of cell is > just to write a tombstone marker for that version. I want to know how about > delete a row? Does HBase deletes the row immediately? or use the same > strategy as deleting a data version which create a tombstone for that row > key? Or create a tombstone for every data version belongs to that row? > > regards > > Yong >
-
Re: How HBase implements delete operationsyonghu 2011-11-26, 07:47
But I just considered about the efficiency. Why HBase does not directly
write a tombstone to row key instead of for each cell? regards Yong On Sat, Nov 26, 2011 at 8:11 AM, Jahangir Mohammed <[EMAIL PROTECTED]>wrote: > Tombstone. Same as cell. > > Thanks, > Jahangir Mohammed. > > On Sat, Nov 26, 2011 at 1:14 AM, yonghu <[EMAIL PROTECTED]> wrote: > > > hello, > > > > I read http://hbase.apache.org/book/versions.html and have a question > > about > > delete operation. As it mentions, the user can delete a whole row or > delete > > a data version of cell. The delete operation of data version of cell is > > just to write a tombstone marker for that version. I want to know how > about > > delete a row? Does HBase deletes the row immediately? or use the same > > strategy as deleting a data version which create a tombstone for that row > > key? Or create a tombstone for every data version belongs to that row? > > > > regards > > > > Yong > > >
-
Re: How HBase implements delete operationsDoug Meil 2011-11-26, 15:33
This is a good question. I'm actually not sure. According to the Delete Javadoc... http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Del ete.html "To delete an entire row, instantiate a Delete object with the row to delete. To further define the scope of what to delete, perform additional methods as outlined below." ... it supports an unqualified row level delete. So the question is what the RS does with the Delete instance in terms of generating KeyValues... http://hbase.apache.org/book.html#store ... I need to try this myself to see what happens, but I'd appreciate a comment from other committers on this. On 11/26/11 2:47 AM, "yonghu" <[EMAIL PROTECTED]> wrote: >But I just considered about the efficiency. Why HBase does not directly >write a tombstone to row key instead of for each cell? > >regards > >Yong > >On Sat, Nov 26, 2011 at 8:11 AM, Jahangir Mohammed ><[EMAIL PROTECTED]>wrote: > >> Tombstone. Same as cell. >> >> Thanks, >> Jahangir Mohammed. >> >> On Sat, Nov 26, 2011 at 1:14 AM, yonghu <[EMAIL PROTECTED]> wrote: >> >> > hello, >> > >> > I read http://hbase.apache.org/book/versions.html and have a question >> > about >> > delete operation. As it mentions, the user can delete a whole row or >> delete >> > a data version of cell. The delete operation of data version of cell >>is >> > just to write a tombstone marker for that version. I want to know how >> about >> > delete a row? Does HBase deletes the row immediately? or use the same >> > strategy as deleting a data version which create a tombstone for that >>row >> > key? Or create a tombstone for every data version belongs to that >>row? >> > >> > regards >> > >> > Yong >> > >>
-
Re: How HBase implements delete operationsJahangir Mohammed 2011-11-26, 17:02
Every version is a record for a rowkey. When you say, a row has to be
deleted, all the versions of the row have to be deleted and all versions go as a record in file and they should be marked so that when compaction runs, the merged file doesn't contain the deleted records. I am ready to be wronged, but let any committer comment on this. I am too new to HBase. Thanks, Jahangir Mohammed. private void prepareDelete(Delete delete) throws IOException { // Check to see if this is a deleteRow insert if(delete.getFamilyMap().isEmpty()){ for(byte [] family : this.htableDescriptor.getFamiliesKeys()){ // Don't eat the timestamp delete.deleteFamily(family, delete.getTimeStamp()); } } else { for(byte [] family : delete.getFamilyMap().keySet()) { if(family == null) { throw new NoSuchColumnFamilyException("Empty family is invalid"); } checkFamily(family); } } } On Sat, Nov 26, 2011 at 2:47 AM, yonghu <[EMAIL PROTECTED]> wrote: > But I just considered about the efficiency. Why HBase does not directly > write a tombstone to row key instead of for each cell? > > regards > > Yong > > On Sat, Nov 26, 2011 at 8:11 AM, Jahangir Mohammed > <[EMAIL PROTECTED]>wrote: > > > Tombstone. Same as cell. > > > > Thanks, > > Jahangir Mohammed. > > > > On Sat, Nov 26, 2011 at 1:14 AM, yonghu <[EMAIL PROTECTED]> wrote: > > > > > hello, > > > > > > I read http://hbase.apache.org/book/versions.html and have a question > > > about > > > delete operation. As it mentions, the user can delete a whole row or > > delete > > > a data version of cell. The delete operation of data version of cell is > > > just to write a tombstone marker for that version. I want to know how > > about > > > delete a row? Does HBase deletes the row immediately? or use the same > > > strategy as deleting a data version which create a tombstone for that > row > > > key? Or create a tombstone for every data version belongs to that row? > > > > > > regards > > > > > > Yong > > > > > >
-
Re: How HBase implements delete operationslars hofhansl 2011-11-27, 19:32
There are exactly three different types of delete marker:
1. delete 2. delete column 3. delete family #1 is for a specific version of a column #2 is for all versions of a column #3 is for all columns of a particular column family In order to delete an entire row HBase internally places a delete family marker for each column family. -- Lars ----- Original Message ----- From: Jahangir Mohammed <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Saturday, November 26, 2011 9:02 AM Subject: Re: How HBase implements delete operations Every version is a record for a rowkey. When you say, a row has to be deleted, all the versions of the row have to be deleted and all versions go as a record in file and they should be marked so that when compaction runs, the merged file doesn't contain the deleted records. I am ready to be wronged, but let any committer comment on this. I am too new to HBase. Thanks, Jahangir Mohammed. private void prepareDelete(Delete delete) throws IOException { // Check to see if this is a deleteRow insert if(delete.getFamilyMap().isEmpty()){ for(byte [] family : this.htableDescriptor.getFamiliesKeys()){ // Don't eat the timestamp delete.deleteFamily(family, delete.getTimeStamp()); } } else { for(byte [] family : delete.getFamilyMap().keySet()) { if(family == null) { throw new NoSuchColumnFamilyException("Empty family is invalid"); } checkFamily(family); } } } On Sat, Nov 26, 2011 at 2:47 AM, yonghu <[EMAIL PROTECTED]> wrote: > But I just considered about the efficiency. Why HBase does not directly > write a tombstone to row key instead of for each cell? > > regards > > Yong > > On Sat, Nov 26, 2011 at 8:11 AM, Jahangir Mohammed > <[EMAIL PROTECTED]>wrote: > > > Tombstone. Same as cell. > > > > Thanks, > > Jahangir Mohammed. > > > > On Sat, Nov 26, 2011 at 1:14 AM, yonghu <[EMAIL PROTECTED]> wrote: > > > > > hello, > > > > > > I read http://hbase.apache.org/book/versions.html and have a question > > > about > > > delete operation. As it mentions, the user can delete a whole row or > > delete > > > a data version of cell. The delete operation of data version of cell is > > > just to write a tombstone marker for that version. I want to know how > > about > > > delete a row? Does HBase deletes the row immediately? or use the same > > > strategy as deleting a data version which create a tombstone for that > row > > > key? Or create a tombstone for every data version belongs to that row? > > > > > > regards > > > > > > Yong > > > > > >
-
Re: How HBase implements delete operationsyonghu 2011-11-27, 20:34
So, it means that if a row contains 3 column-families. To delete this row,
the HBase will create three tombstones. Is that right? Yong On Sun, Nov 27, 2011 at 8:32 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > There are exactly three different types of delete marker: > > 1. delete > 2. delete column > 3. delete family > > > #1 is for a specific version of a column > #2 is for all versions of a column > #3 is for all columns of a particular column family > > In order to delete an entire row HBase internally places a delete family > marker for each column family. > > -- Lars > > > ----- Original Message ----- > From: Jahangir Mohammed <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Saturday, November 26, 2011 9:02 AM > Subject: Re: How HBase implements delete operations > > Every version is a record for a rowkey. When you say, a row has to be > deleted, all the versions of the row have to be deleted and all versions go > as a record in file and they should be marked so that when compaction runs, > the merged file doesn't contain the deleted records. I am ready to be > wronged, but let any committer comment on this. I am too new to HBase. > > Thanks, > Jahangir Mohammed. > > private void prepareDelete(Delete delete) throws IOException { > // Check to see if this is a deleteRow insert > if(delete.getFamilyMap().isEmpty()){ > for(byte [] family : this.htableDescriptor.getFamiliesKeys()){ > // Don't eat the timestamp > delete.deleteFamily(family, delete.getTimeStamp()); > } > } else { > for(byte [] family : delete.getFamilyMap().keySet()) { > if(family == null) { > throw new NoSuchColumnFamilyException("Empty family is invalid"); > } > checkFamily(family); > } > } > } > > > > On Sat, Nov 26, 2011 at 2:47 AM, yonghu <[EMAIL PROTECTED]> wrote: > > > But I just considered about the efficiency. Why HBase does not directly > > write a tombstone to row key instead of for each cell? > > > > regards > > > > Yong > > > > On Sat, Nov 26, 2011 at 8:11 AM, Jahangir Mohammed > > <[EMAIL PROTECTED]>wrote: > > > > > Tombstone. Same as cell. > > > > > > Thanks, > > > Jahangir Mohammed. > > > > > > On Sat, Nov 26, 2011 at 1:14 AM, yonghu <[EMAIL PROTECTED]> wrote: > > > > > > > hello, > > > > > > > > I read http://hbase.apache.org/book/versions.html and have a > question > > > > about > > > > delete operation. As it mentions, the user can delete a whole row or > > > delete > > > > a data version of cell. The delete operation of data version of cell > is > > > > just to write a tombstone marker for that version. I want to know how > > > about > > > > delete a row? Does HBase deletes the row immediately? or use the same > > > > strategy as deleting a data version which create a tombstone for that > > row > > > > key? Or create a tombstone for every data version belongs to that > row? > > > > > > > > regards > > > > > > > > Yong > > > > > > > > > > >
-
Re: How HBase implements delete operationslars hofhansl 2011-11-27, 23:31
That is correct.
________________________________ From: yonghu <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> Sent: Sunday, November 27, 2011 12:34 PM Subject: Re: How HBase implements delete operations So, it means that if a row contains 3 column-families. To delete this row, the HBase will create three tombstones. Is that right? Yong On Sun, Nov 27, 2011 at 8:32 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > There are exactly three different types of delete marker: > > 1. delete > 2. delete column > 3. delete family > > > #1 is for a specific version of a column > #2 is for all versions of a column > #3 is for all columns of a particular column family > > In order to delete an entire row HBase internally places a delete family > marker for each column family. > > -- Lars > > > ----- Original Message ----- > From: Jahangir Mohammed <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Saturday, November 26, 2011 9:02 AM > Subject: Re: How HBase implements delete operations > > Every version is a record for a rowkey. When you say, a row has to be > deleted, all the versions of the row have to be deleted and all versions go > as a record in file and they should be marked so that when compaction runs, > the merged file doesn't contain the deleted records. I am ready to be > wronged, but let any committer comment on this. I am too new to HBase. > > Thanks, > Jahangir Mohammed. > > private void prepareDelete(Delete delete) throws IOException { > // Check to see if this is a deleteRow insert > if(delete.getFamilyMap().isEmpty()){ > for(byte [] family : this.htableDescriptor.getFamiliesKeys()){ > // Don't eat the timestamp > delete.deleteFamily(family, delete.getTimeStamp()); > } > } else { > for(byte [] family : delete.getFamilyMap().keySet()) { > if(family == null) { > throw new NoSuchColumnFamilyException("Empty family is invalid"); > } > checkFamily(family); > } > } > } > > > > On Sat, Nov 26, 2011 at 2:47 AM, yonghu <[EMAIL PROTECTED]> wrote: > > > But I just considered about the efficiency. Why HBase does not directly > > write a tombstone to row key instead of for each cell? > > > > regards > > > > Yong > > > > On Sat, Nov 26, 2011 at 8:11 AM, Jahangir Mohammed > > <[EMAIL PROTECTED]>wrote: > > > > > Tombstone. Same as cell. > > > > > > Thanks, > > > Jahangir Mohammed. > > > > > > On Sat, Nov 26, 2011 at 1:14 AM, yonghu <[EMAIL PROTECTED]> wrote: > > > > > > > hello, > > > > > > > > I read http://hbase.apache.org/book/versions.html and have a > question > > > > about > > > > delete operation. As it mentions, the user can delete a whole row or > > > delete > > > > a data version of cell. The delete operation of data version of cell > is > > > > just to write a tombstone marker for that version. I want to know how > > > about > > > > delete a row? Does HBase deletes the row immediately? or use the same > > > > strategy as deleting a data version which create a tombstone for that > > row > > > > key? Or create a tombstone for every data version belongs to that > row? > > > > > > > > regards > > > > > > > > Yong > > > > > > > > > > >
-
Re: How HBase implements delete operationsDoug Meil 2011-11-28, 16:08
Thanks Lars, I'll update the docs with this. On 11/27/11 6:31 PM, "lars hofhansl" <[EMAIL PROTECTED]> wrote: >That is correct. > >________________________________ > From: yonghu <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> >Sent: Sunday, November 27, 2011 12:34 PM >Subject: Re: How HBase implements delete operations > >So, it means that if a row contains 3 column-families. To delete this row, >the HBase will create three tombstones. Is that right? > >Yong > >On Sun, Nov 27, 2011 at 8:32 PM, lars hofhansl <[EMAIL PROTECTED]> >wrote: > >> There are exactly three different types of delete marker: >> >> 1. delete >> 2. delete column >> 3. delete family >> >> >> #1 is for a specific version of a column >> #2 is for all versions of a column >> #3 is for all columns of a particular column family >> >> In order to delete an entire row HBase internally places a delete family >> marker for each column family. >> >> -- Lars >> >> >> ----- Original Message ----- >> From: Jahangir Mohammed <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Cc: >> Sent: Saturday, November 26, 2011 9:02 AM >> Subject: Re: How HBase implements delete operations >> >> Every version is a record for a rowkey. When you say, a row has to be >> deleted, all the versions of the row have to be deleted and all >>versions go >> as a record in file and they should be marked so that when compaction >>runs, >> the merged file doesn't contain the deleted records. I am ready to be >> wronged, but let any committer comment on this. I am too new to HBase. >> >> Thanks, >> Jahangir Mohammed. >> >> private void prepareDelete(Delete delete) throws IOException { >> // Check to see if this is a deleteRow insert >> if(delete.getFamilyMap().isEmpty()){ >> for(byte [] family : this.htableDescriptor.getFamiliesKeys()){ >> // Don't eat the timestamp >> delete.deleteFamily(family, delete.getTimeStamp()); >> } >> } else { >> for(byte [] family : delete.getFamilyMap().keySet()) { >> if(family == null) { >> throw new NoSuchColumnFamilyException("Empty family is >>invalid"); >> } >> checkFamily(family); >> } >> } >> } >> >> >> >> On Sat, Nov 26, 2011 at 2:47 AM, yonghu <[EMAIL PROTECTED]> wrote: >> >> > But I just considered about the efficiency. Why HBase does not >>directly >> > write a tombstone to row key instead of for each cell? >> > >> > regards >> > >> > Yong >> > >> > On Sat, Nov 26, 2011 at 8:11 AM, Jahangir Mohammed >> > <[EMAIL PROTECTED]>wrote: >> > >> > > Tombstone. Same as cell. >> > > >> > > Thanks, >> > > Jahangir Mohammed. >> > > >> > > On Sat, Nov 26, 2011 at 1:14 AM, yonghu <[EMAIL PROTECTED]> >>wrote: >> > > >> > > > hello, >> > > > >> > > > I read http://hbase.apache.org/book/versions.html and have a >> question >> > > > about >> > > > delete operation. As it mentions, the user can delete a whole row >>or >> > > delete >> > > > a data version of cell. The delete operation of data version of >>cell >> is >> > > > just to write a tombstone marker for that version. I want to know >>how >> > > about >> > > > delete a row? Does HBase deletes the row immediately? or use the >>same >> > > > strategy as deleting a data version which create a tombstone for >>that >> > row >> > > > key? Or create a tombstone for every data version belongs to that >> row? >> > > > >> > > > regards >> > > > >> > > > Yong >> > > > >> > > >> > >>
-
Re: How HBase implements delete operationslars hofhansl 2011-11-28, 19:03
Cool! Maybe we can relate that to the client API as well...
On the client this is controlled using the Delete object. o creating a Delete object for a row without specifying anything else will place a family delete marker for each CF. o columns for specific CFs can be deleted by using deleteFamily(...), places a family delete marker o all versions of a column are deleted by using deleteColumns(...), places a column deleter marker o a specific version of a column is deleted by using deleteColumn(...), places a delete maker All of these methods/constructors take a timestamp, which indicates removal of all versions up to that (including) that version. (except for deleteColumn, which is always version specific). ----- Original Message ----- From: Doug Meil <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> Cc: Sent: Monday, November 28, 2011 8:08 AM Subject: Re: How HBase implements delete operations Thanks Lars, I'll update the docs with this. On 11/27/11 6:31 PM, "lars hofhansl" <[EMAIL PROTECTED]> wrote: >That is correct. > >________________________________ > From: yonghu <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> >Sent: Sunday, November 27, 2011 12:34 PM >Subject: Re: How HBase implements delete operations > >So, it means that if a row contains 3 column-families. To delete this row, >the HBase will create three tombstones. Is that right? > >Yong > >On Sun, Nov 27, 2011 at 8:32 PM, lars hofhansl <[EMAIL PROTECTED]> >wrote: > >> There are exactly three different types of delete marker: >> >> 1. delete >> 2. delete column >> 3. delete family >> >> >> #1 is for a specific version of a column >> #2 is for all versions of a column >> #3 is for all columns of a particular column family >> >> In order to delete an entire row HBase internally places a delete family >> marker for each column family. >> >> -- Lars >> >> >> ----- Original Message ----- >> From: Jahangir Mohammed <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Cc: >> Sent: Saturday, November 26, 2011 9:02 AM >> Subject: Re: How HBase implements delete operations >> >> Every version is a record for a rowkey. When you say, a row has to be >> deleted, all the versions of the row have to be deleted and all >>versions go >> as a record in file and they should be marked so that when compaction >>runs, >> the merged file doesn't contain the deleted records. I am ready to be >> wronged, but let any committer comment on this. I am too new to HBase. >> >> Thanks, >> Jahangir Mohammed. >> >> private void prepareDelete(Delete delete) throws IOException { >> // Check to see if this is a deleteRow insert >> if(delete.getFamilyMap().isEmpty()){ >> for(byte [] family : this.htableDescriptor.getFamiliesKeys()){ >> // Don't eat the timestamp >> delete.deleteFamily(family, delete.getTimeStamp()); >> } >> } else { >> for(byte [] family : delete.getFamilyMap().keySet()) { >> if(family == null) { >> throw new NoSuchColumnFamilyException("Empty family is >>invalid"); >> } >> checkFamily(family); >> } >> } >> } >> >> >> >> On Sat, Nov 26, 2011 at 2:47 AM, yonghu <[EMAIL PROTECTED]> wrote: >> >> > But I just considered about the efficiency. Why HBase does not >>directly >> > write a tombstone to row key instead of for each cell? >> > >> > regards >> > >> > Yong >> > >> > On Sat, Nov 26, 2011 at 8:11 AM, Jahangir Mohammed >> > <[EMAIL PROTECTED]>wrote: >> > >> > > Tombstone. Same as cell. >> > > >> > > Thanks, >> > > Jahangir Mohammed. >> > > >> > > On Sat, Nov 26, 2011 at 1:14 AM, yonghu <[EMAIL PROTECTED]> >>wrote: >> > > >> > > > hello, >> > > > >> > > > I read http://hbase.apache.org/book/versions.html and have a >> question >> > > > about >> > > > delete operation. As it mentions, the user can delete a whole row >>or >> > > delete >> > > > a data version of cell. The delete operation of data version of |