Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Using own InputSplit


Copy link to this message
-
Using own InputSplit
I am new to hadoop and from what I understand by default hadoop splits
the input into blocks. Now this might result in splitting a line of
record into 2 pieces and getting spread accross 2 maps. For eg: Line
"abcd" might get split into "ab" and "cd". How can one prevent this in
hadoop and pig? I am looking for some examples where I can see how I
can specify my own split so that it logically splits based on the
record delimiter and not the block size. For some reason I am not able
to get right examples online.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB