|
|
Keith Wiley 2010-09-27, 07:23
Is there a particularly good reason for why the "hadoop fs" command supports -cat and -tail, but not -head?
________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
"I do not feel obliged to believe that the same God who has endowed us with sense, reason, and intellect has intended us to forgo their use." -- Galileo Galilei ________________________________________________________________________________
Edward Capriolo 2010-09-27, 14:02
On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <[EMAIL PROTECTED]> wrote: > Is there a particularly good reason for why the "hadoop fs" command supports > -cat and -tail, but not -head? > > ________________________________________________________________________________ > Keith Wiley [EMAIL PROTECTED] keithwiley.com > music.keithwiley.com > > "I do not feel obliged to believe that the same God who has endowed us with > sense, reason, and intellect has intended us to forgo their use." > -- Galileo Galilei > ________________________________________________________________________________ > >
Tail is needed to be done efficiently but head you can just do yourself. Most people probably use
hadoop dfs -cat file | head -5.
Keith Wiley 2010-09-27, 15:13
On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
> On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <[EMAIL PROTECTED]> > wrote: >> Is there a particularly good reason for why the "hadoop fs" command >> supports >> -cat and -tail, but not -head? >> > > Tail is needed to be done efficiently but head you can just do > yourself. Most people probably use > > hadoop dfs -cat file | head -5. I disagree with your use of the word "efficiently". :-) To my understanding (and perhaps that's the source of my error), the approach you suggested reads the entire file over the net from the cluster to your client machine. That file could conceivably be of HDFS scales (100s of GBs, even TBs wouldn't be uncommon).
What do you think? Am I wrong in my interpretation of how hadoopCat- pipe-head would work?
Cheers!
________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
"And what if we picked the wrong religion? Every week, we're just making God madder and madder!" -- Homer Simpson ________________________________________________________________________________
Edward Capriolo 2010-09-27, 20:46
On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley <[EMAIL PROTECTED]> wrote: > On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote: > >> On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <[EMAIL PROTECTED]> >> wrote: >>> >>> Is there a particularly good reason for why the "hadoop fs" command >>> supports >>> -cat and -tail, but not -head? >>> >> >> Tail is needed to be done efficiently but head you can just do >> yourself. Most people probably use >> >> hadoop dfs -cat file | head -5. > > > I disagree with your use of the word "efficiently". :-) To my > understanding (and perhaps that's the source of my error), the approach you > suggested reads the entire file over the net from the cluster to your client > machine. That file could conceivably be of HDFS scales (100s of GBs, even > TBs wouldn't be uncommon). > > What do you think? Am I wrong in my interpretation of how > hadoopCat-pipe-head would work? > > Cheers! > > ________________________________________________________________________________ > Keith Wiley [EMAIL PROTECTED] keithwiley.com > music.keithwiley.com > > "And what if we picked the wrong religion? Every week, we're just making > God > madder and madder!" > -- Homer Simpson > ________________________________________________________________________________ > >
'hadoop dfs -cat' will output the file as it is read. head -5 will kill the first half of the pipe after 5 lines. With buffering more might be physically read then 5 lines but this invocation does not read the enter HDFS file before piping it to head.
Keith Wiley 2010-09-28, 00:34
On Sep 27, 2010, at 13:46 , Edward Capriolo wrote:
> On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley <[EMAIL PROTECTED]> wrote: >> On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote: >> >>> On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> Is there a particularly good reason for why the "hadoop fs" command >>>> supports >>>> -cat and -tail, but not -head? >>>> >>> >>> Tail is needed to be done efficiently but head you can just do >>> yourself. Most people probably use >>> >>> hadoop dfs -cat file | head -5. >> >> >> I disagree with your use of the word "efficiently". :-) To my >> understanding (and perhaps that's the source of my error), the approach you >> suggested reads the entire file over the net from the cluster to your client >> machine. That file could conceivably be of HDFS scales (100s of GBs, even >> TBs wouldn't be uncommon). >> >> What do you think? Am I wrong in my interpretation of how >> hadoopCat-pipe-head would work? >> > 'hadoop dfs -cat' will output the file as it is read. head -5 will > kill the first half of the pipe after 5 lines. With buffering more > might be physically read then 5 lines but this invocation does not > read the enter HDFS file before piping it to head. Excellent. Thank you.
________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] www.keithwiley.com
"I used to be with it, but then they changed what it was. Now, what I'm with isn't it, and what's it seems weird and scary to me." -- Abe (Grandpa) Simpson ________________________________________________________________________________
|
|