Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> copy chunk of hadoop output


+
jamal sasha 2013-02-19, 22:08
+
Harsh J 2013-02-19, 22:14
+
jamal sasha 2013-02-19, 22:16
+
Jean-Marc Spaggiari 2013-02-20, 19:44
+
Harsh J 2013-02-20, 20:21
+
jamal sasha 2013-03-01, 23:21
Copy link to this message
-
Re: copy chunk of hadoop output
Though it copies.. but it gives this error?
On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <[EMAIL PROTECTED]> wrote:

> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> No problem JM, I was confused as well.
>>
>> AFAIK, there's no shell utility that can let you specify an offset #
>> of bytes to start off with (similar to skip in dd?), but that can be
>> done from the FS API.
>>
>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>> <[EMAIL PROTECTED]> wrote:
>> > Hi Harsh,
>> >
>> > My bad.
>> >
>> > I read the example quickly and I don't know why I tought you used tail
>> > and not head.
>> >
>> > head will work perfectly. But tail will not since it will need to read
>> > the entier file. My comment was for tail, not for head, and therefore
>> > not application to the example you gave.
>> >
>> >
>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>> >
>> > Will have to download the entire file.
>> >
>> > Is there a way to "jump" into a certain position in a file and "cat"
>> from there?
>> >
>> > JM
>> >
>> > 2013/2/20, Harsh J <[EMAIL PROTECTED]>:
>> >> Hi JM,
>> >>
>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>> >> and as you yourself note, it will only last as long as the last bytes
>> >> have been got and then terminate.
>> >>
>> >> The -cat process will terminate because the
>> >> process we're piping to will terminate first after it reaches its goal
>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> >> whole file down but it may fetch a few bytes extra over communication
>> >> due to use of read buffers (the extra data won't be put into the target
>> >> file, and get discarded).
>> >>
>> >> We can try it out and observe the "clienttrace" logged
>> >> at the DN at the end of the -cat's read. Here's an example:
>> >>
>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> >> below, its ~1.58 MB:
>> >>
>> >> 2013-02-20 23:55:19,777 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 192289000
>> >>
>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> >> store first 5 bytes onto a local file:
>> >>
>> >> Asserting that post command we get 5 bytes:
>> >> ➜  ~ wc -c foo.xml
>> >>        5 foo.xml
>> >>
>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>> >> of 1.58 MB we wrote earlier:
>> >>
>> >> 2013-02-21 00:01:32,437 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 19207000
>> >>
>> >> I don't see how this is anymore dangerous than doing a
>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>> >>
>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> >> <[EMAIL PROTECTED]> wrote:
>> >>> But be careful.
>> >>>
>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>> >>> will have retrieve the last bytes you are looking for.
>> >>>
>> >>> If your file is many GB big, it will take a lot of time for this
>> >>> command to complete and will put some pressure on your network.
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/2/19, jamal sasha <[EMAIL PROTECTED]>:
+
Azuryy Yu 2013-03-02, 00:38
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB