Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> one minute delay in running a simple ls command on hadoop (maybe near security groups..): hadoop 0.23.5


+
Gopi Krishna M 2013-03-31, 06:17
+
Daryn Sharp 2013-04-04, 12:59
+
Gopi Krishna M 2013-04-04, 13:09
+
Gopi Krishna M 2013-04-04, 12:33
Copy link to this message
-
Re: one minute delay in running a simple ls command on hadoop (maybe near security groups..): hadoop 0.23.5
Hi Marcos,

executed it from shell.. returns immediately, no delay and the output is
just hduser..

hduser@hmaster-asia:~/installations/hadoop-0.23.5$ bash -c "id -Gn hduser"
hduser

one important thing I noticed on logs is (posted in my first mail), it
always get stuck *twice for exactly 60secs*.  This looks like some timeout.
 Went through settings which are close to 60secs, but no luck yet.
Output of strace shows futex as the last call after which it takes 2 mins
and then print the ls output.  Not clear if it is stuck on futex there as
that seems to be the last system call.  I tried with 0.23.6 but got the
same behavior.
*
*
*strace -t bin/hadoop dfs -ls / ; date*
.......
15:09:29 openat(AT_FDCWD,
"/home/hduser/installations/hadoop-0.23.5/share/hadoop/mapreduce/", O_RDON
LY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
15:09:29 getdents(3, /* 11 entries */, 32768) = 584
15:09:29 getdents(3, /* 0 entries */, 32768) = 0
15:09:29 close(3)                       = 0
15:09:29 mmap(NULL, 1052672, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7
ff716300000
15:09:29 mprotect(0x7ff716300000, 4096, PROT_NONE) = 0
15:09:29 clone(child_stack=0x7ff7163ffff0,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_T
HREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x7ff716400
9d0, tls=0x7ff716400700, child_tidptr=0x7ff7164009d0) = 42720
*15:09:29 futex*(0x7ff7164009d0, FUTEX_WAIT, 42720, NULLFound 5 items
drwxr-xr-x   - hduser supergroup          0 2013-03-26 12:28
/aea8d83f-c0a6-4646-a88d-f9a89be92486
drwxr-xr-x   - hduser supergroup          0 2013-03-06 12:43 /in
drwxr-xr-x   - hduser supergroup          0 2013-03-04 12:57 /out1
drwxrwx---   - hduser supergroup          0 2013-03-04 12:32 /tmp
drwx------   - hduser supergroup          0 2013-03-04 12:49 /user
 <unfinished ... exit status 0>
Thu Apr  4* 15:11:32 UTC 2013*
~/installations/hadoop-0.23.5$

On Thu, Apr 4, 2013 at 6:36 PM, MARCOS MEDRADO RUBINELLI <
[EMAIL PROTECTED]> wrote:

>  Gopi,
>
> The namenode is essentially running bash -c "id -Gn hduser" and waiting
> for the response. You could try executing from the shell it to see if it
> does take a long time in Azure, or if the output is too complex to parse.
>
> Regards,
> Marcos
>
>
> In 04-04-2013 09:33, Gopi Krishna M wrote:
>
> in case anybody has some idea on what I should look for in debugging
> this.. below is a strace of the simple bin/hadoop dfs -ls / command run on
> the machine running namenode
>
>  % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  49.56    0.000853          45        19        14 execve
>  36.20    0.000623           1       925           rt_sigprocmask
>   5.75    0.000099           2        62           brk
>   5.29    0.000091           9        10           getgid
>   3.20    0.000055           1        81           rt_sigaction
>   0.00    0.000000           0        71           read
>   0.00    0.000000           0         3           write
>   0.00    0.000000           0        49        11 open
>   0.00    0.000000           0        87           close
>   0.00    0.000000           0        71        31 stat
>   0.00    0.000000           0        36           fstat
>   0.00    0.000000           0        12           lseek
>   0.00    0.000000           0        72           mmap
>   0.00    0.000000           0        36           mprotect
>   0.00    0.000000           0         9           munmap
>   0.00    0.000000           0        19           rt_sigreturn
>   0.00    0.000000           0         2         2 ioctl
>   0.00    0.000000           0        40        31 access
>   0.00    0.000000           0        19           pipe
>   0.00    0.000000           0         8           dup2
>   0.00    0.000000           0         2           getpid
>   0.00    0.000000           0        20           clone
>   0.00    0.000000           0        38        19 wait4