Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> parallel cat


Copy link to this message
-
parallel cat
I have many large files ranging from 2gb to 800gb and I use hadoop fs -cat a
lot to pipe to various programs.

I was wondering if its possible to prefetch the data for clients with more
bandwidth. Most of my clients have 10g interface and datanodes are 1g.

I was thinking, prefetch x blocks (even though it will cost extra memory)
while reading block y. After block y is read, read the prefetched blocked
and then throw it away.

It should be used like this:
export PREFETCH_BLOCKS=2 #default would be 1
hadoop fs -pcat hdfs://namenode/verylarge file | program

Any thoughts?


--
--- Get your facts first, then you can distort them as you please.--
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB