Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase performance with HDFS


Copy link to this message
-
Re: Hbase performance with HDFS
Got it. I was thinking compaction happens local to the node. But it
makes sense from what you have explained.

On Thu, Jul 7, 2011 at 3:12 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>> 1) When compactions occur on Node A would it also include b2 and b3
>> which is actually a redundant copy? My guess is yes.
>
>
> I don't follow your question.
>
> HDFS files are read by opening an input stream. This stream is fed data from block replicas chosen at random. One block replica for each block. The reader doesn't see "redundant copies".
>
>> 2) Now compaction occurs and creates HFile3 which as you said is
>> replicated. But what happens to HFile1 and HFile2? I am assuming it
>> gets deleted.
>
>
> They are deleted.
>
>
> Best regards,
>
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>
>
> ----- Original Message -----
>> From: Mohit Anchlia <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Cc:
>> Sent: Thursday, July 7, 2011 3:02 PM
>> Subject: Re: Hbase performance with HDFS
>>
>>T hanks! I understand what you mean however I have little confusion.
>> Does it mean there are unused block sitting around? For eg:
>>
>> HFile1 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node
>> B:b1,(b2),b3 and Node C:b1,b2,(b3).
>>
>> HFile2 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node
>> B:b1,(b2),b3 and Node C:b1,b2,(b3)
>>
>> I have 2 questions:
>>
>> 1) When compactions occur on Node A would it also include b2 and b3
>> which is actually a redundant copy? My guess is yes.
>> 2) Now compaction occurs and creates HFile3 which as you said is
>> replicated. But what happens to HFile1 and HFile2? I am assuming it
>> gets deleted.
>>
>> Thanks for everyones patience!
>>
>> On Thu, Jul 7, 2011 at 2:43 PM, Buttler, David <[EMAIL PROTECTED]> wrote:
>>>  The nice part of using HDFS as the file system is that the replication is
>> taken care of by the file system.  So, when the compaction finishes, that means
>> the replication has already taken place.
>>>
>>>  -----Original Message-----
>>>  From: Mohit Anchlia [mailto:[EMAIL PROTECTED]]
>>>  Sent: Thursday, July 07, 2011 2:02 PM
>>>  To: [EMAIL PROTECTED]; Andrew Purtell
>>>  Subject: Re: Hbase performance with HDFS
>>>
>>>  Thanks Andrew. Really helpful. I think I have one more question right
>>>  now :) Underneath HDFS replicates blocks by default 3. Not sure how it
>>>  relates to HFile and compactions. When compaction occurs is it also
>>>  happening on the replica blocks from other nodes? If not then how does
>>>  it work when one node fails.
>>>
>>>  On Thu, Jul 7, 2011 at 1:53 PM, Andrew Purtell <[EMAIL PROTECTED]>
>> wrote:
>>>>>  You mentioned about compactions, when do those occur and what
>> triggers
>>>>>  them?
>>>>
>>>>  Compactions are triggered by an algorithm that monitors the number of
>> flush files in a store and the size of them, and is configurable in several
>> dimensions.
>>>>
>>>>>  Does it cause additional space usage when that happens
>>>>
>>>>  Yes.
>>>>
>>>>>  if it
>>>>>  does it would mean you always need to have much more disk then you
>>>>>  really need.
>>>>
>>>>
>>>>  Not all regions are compacted at once. Each region by default is
>> constrained to 256 MB. Not all regions will hold the full amount of data. The
>> result is not a perfect copy (doubling) if some data has been deleted or are
>> associated with TTLs that have expired. The merge sorted result is moved into
>> place and the old files are deleted as soon as the compaction completes. So how
>> much more is "much more"? You can't write to any kind of data
>> store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, or...
>>>>
>>>>>  Since HDFS is mostly write once how are updates/deletes handled?
>>>>
>>>>
>>>>  Not mostly, only write once.
>>>>
>>>>  From the BigTable paper, section 5.3: "A valid read operation is
>> executed on a merged view of the sequence of SSTables and the memtable. Since
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB