Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> FSDataOutputStream hangs in out.close()


Copy link to this message
-
Re: FSDataOutputStream hangs in out.close()
Same data does not mean same block IDs across two clusters. I'm
guessing this is cause of some issue in your code when wanting to
write to two different HDFS instances with the same client. Did you do
a low level mod for HDFS writes as well or just create two different
FS instances when you want to write to different ones?

On Wed, Mar 27, 2013 at 9:34 PM, Pedro Sá da Costa <[EMAIL PROTECTED]> wrote:
> I can add this information taken from the datanode logs, but it seems
> something related to blocks:
>
> nfoPort=50075, ipcPort=50020):Got exception while serving
> blk_-4664365259588027316_2050 to /XXX.XXX.XXX.123:
> java.io.IOException: Block blk_-4664365259588027316_2050 is not valid.
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:1072)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:1035)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:1045)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
>         at java.lang.Thread.run(Thread.java:662)
>
> 2013-03-27 15:44:54,965 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(XXX.XXX.XXX.123:50010,
> storageID=DS-595468034-XXX.XXX.XXX.123-50010-1364122596021, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.io.IOException: Block blk_-4664365259588027316_2050 is not valid.
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:1072)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:1035)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:1045)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
>         at java.lang.Thread.run(Thread.java:662)
>
> I still have no idea why this error, if the 2 HDFS instances have the same
> data.
>
>
> On 27 March 2013 15:53, Pedro Sá da Costa <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> I'm trying to make the same client to talk to different HDFS and JT
>> instances that are in different sites of Amazon EC2. The error that I got
>> is:
>>
>>  java.io.IOException: Got error for OP_READ_BLOCK,
>> self=/XXX.XXX.XXX.123:44734,
>>
>> remote=ip-XXX-XXX-XXX-123.eu-west-1.compute.internal/XXX.XXX.XXX.123:50010,
>> for file
>>
>> ip-XXX-XXX-XXX-123.eu-west-1.compute.internal/XXX.XXX.XXX.123:50010:-4664365259588027316,
>> for block
>>    -4664365259588027316_2050
>>
>> This error means than it wasn't possible to write on a remote host?
>>
>>
>>
>>
>>
>> On 27 March 2013 12:24, Harsh J <[EMAIL PROTECTED]> wrote:
>>>
>>> You can try to take a jstack stack trace and see what its hung on.
>>> I've only ever noticed a close() hang when the NN does not accept the
>>> complete-file call (due to minimum replication not being guaranteed),
>>> but given your changes (which I haven't an idea about yet) it could be
>>> something else as well. You're essentially trying to make the same
>>> client talk to two different FSes I think (aside of the JT RPC).
>>>
>>> On Wed, Mar 27, 2013 at 5:50 PM, Pedro Sá da Costa <[EMAIL PROTECTED]>
>>> wrote:
>>> > Hi,
>>> >
>>> > I'm using the Hadoop 1.0.4 API to try to submit a job in a remote
>>> > JobTracker. I created modfied the JobClient to submit the same job in
>>> > different JTs. E.g, the JobClient is in my PC and it try to submit the
>>> > same
>>> > Job  in 2 JTs at different sites in Amazon EC2. When I'm launching the
>>> > Job,
>>> > in the setup phase, the JobClient is trying to submit split file info

Harsh J