Currently it's CPU-intensive for several reasons:
1) It doesn't yet use the native CRC code
2) It makes several unnecessary copies and byte buffer allocations, both in
the client and in the DataNode
There are open JIRAs for these, and I have a preliminary patch which helped
a lot, but it hasn't been high priority. On most clusters, writing becomes
network bound before being CPU-bound. On the other hand, as 10gbe is
becoming fairly common, this will probably be more important soon. Hoping
to find time to get back to finishing the patches in the next few months.
On Sun, Nov 25, 2012 at 1:41 PM, Radim Kolar <[EMAIL PROTECTED]> wrote:
> anybody tried to profile why HDFS write path is so much CPU intensive?
Software Engineer, Cloudera