|
|
-
TFile Named Meta Blocks Read Write Related
Dare 2012-07-03, 01:07
Hi Hadoop Team,
I have been working with the TFiles in Hadoop. I got a few questions regarding Named Meta Blocks.
1) Using TFile.Writer one can append a <K,V> pair. But, if I prepare a Meta Block, then it returns a DataOutputStream which allows to write in byte[], since my <K,V> pairs are serialized objects. Is this the same way or is there something I am missing. Because, out of my understanding, if I write it as a <K,V> pair, the key indexes will be prepared at the tail of the TFile. But, when I write it as a just byte[], i am not sure if the indexes are formed.
2) While reading, is there a way to read <K,V> entry using the DataInputStream got from the getMetaBlock().
Thanks DaRe
-
Re: TFile Named Meta Blocks Read Write Related
Harsh J 2012-07-03, 02:26
For both 1 and 2, if your metadata is a Writable, you can simply reuse its readFields() and write() methods to serialize it into and out of the data output/input streams.
For instance, assume dos is out, and dis is in, and obj1 (K) and obj2 (V) are my writables, then I do:
To write K and V: obj1.write(dos); obj2.write(dos);
To read K and V in proper order (order is important when deserializing), reconstruct your writable objects and read them in: obj1.readFields(dis); obj2.readFields(dis);
Does this not work for you?
On Tue, Jul 3, 2012 at 6:37 AM, Dare <[EMAIL PROTECTED]> wrote: > Hi Hadoop Team, > > I have been working with the TFiles in Hadoop. I got a few questions > regarding Named Meta Blocks. > > 1) Using TFile.Writer one can append a <K,V> pair. But, if I prepare a Meta > Block, then it returns a DataOutputStream which allows to write in byte[], > since my <K,V> pairs are serialized objects. > Is this the same way or is there something I am missing. Because, out of > my understanding, if I write it as a <K,V> pair, the key indexes will be > prepared at the tail of the TFile. > But, when I write it as a just byte[], i am not sure if the indexes are > formed. > > 2) While reading, is there a way to read <K,V> entry using the > DataInputStream got from the getMetaBlock(). > > Thanks > DaRe
-- Harsh J
+
Harsh J 2012-07-03, 02:26
-
Re: TFile Named Meta Blocks Read Write Related
Dare 2012-07-03, 16:56
Hi Harsh,
Yes, it's possible. But, in my case, I have to follow a custom built serialization and hence, I get a serialized object and I have to store and return it in the same state. So, if I just use a TFile.Writer, I will write all the <K,V> pairs as it is and using TFile.Reader I can read them as it is.
But, the problem is the data is very huge and I want the retrieval time to be very lower. Hence, I thought, I would go for the named Meta Blocks approach, where in I can maintain a map of Key Bounds (as the TFiles in my case are sorted) and hence I can reduce the search scope. I wont be knowing, what kind of objects I will be dealing with. All I am sure is that they follow the custom serialization and they have equals() method implemented.
Hence, if I take a TFile.Reader.Scanner, I can scan the entries. Is there a similar apporach with DataInputStream, where in I can scan the entries instead of bytes.
Thanks Dayakar
-
Re: TFile Named Meta Blocks Read Write Related
Dare 2012-07-03, 19:21
Hey Harsh,
Never mind the previous reply. I was slightly confused with the MetaBlocks and Blocks. I assumed there can be multiple MetaBlocks per file.
I went through the source code and I am good now.
Thanks for the previous reply.
--DaRe
|
|