On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
> On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
>> On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <[EMAIL PROTECTED]>
>>> Is there a particularly good reason for why the "hadoop fs" command
>>> -cat and -tail, but not -head?
>> Tail is needed to be done efficiently but head you can just do
>> yourself. Most people probably use
>> hadoop dfs -cat file | head -5.
> I disagree with your use of the word "efficiently". :-) To my
> understanding (and perhaps that's the source of my error), the approach you
> suggested reads the entire file over the net from the cluster to your client
> machine. That file could conceivably be of HDFS scales (100s of GBs, even
> TBs wouldn't be uncommon).
> What do you think? Am I wrong in my interpretation of how
> hadoopCat-pipe-head would work?
> Keith Wiley [EMAIL PROTECTED] keithwiley.com
> "And what if we picked the wrong religion? Every week, we're just making
> madder and madder!"
> -- Homer Simpson
'hadoop dfs -cat' will output the file as it is read. head -5 will
kill the first half of the pipe after 5 lines. With buffering more
might be physically read then 5 lines but this invocation does not
read the enter HDFS file before piping it to head.