If you use Kudu API and set flush mode for a session to anything but
AUTO_FLUSH_SYNC, those inserts will be accumulated into batches at the
client side and sent to the corresponding tablet servers in chunks.
Consider using the AUTO_FLUSH_BACKGROUND mode while working with
KuduSession API (using MANUAL_FLUSH would require you to flush those
batches manually before the size of the accumulated data reaches the max
allowed size, which is configurable).
Also, if the lines in your file(s) contain data for independent rows
(i.e. you are not expecting to perform upserts for some lines), you
could split those lines into ranges (e.g., 0 -- 999999, 100000 --
199999, etc.) and run multiple Kudu sessions (one per line range in the
file) in parallel.
Hope this helps.
On 7/10/17 7:54 PM, sky wrote: