Hi,

I am working on a Graph Partitioning algorithms, and have chosen Giraph as
a Graph processing system to run Graph problems, and very new to both.I
would like to provide external partitioning information(in the form of txt
file) to Giraph. For this I have created a custom partition (something like
HashPartitionFactory), which reads the external file for graph partition Id.

While debugg I realize that this parition logic is invoked several times
(during the Giraph supersteps) ,and reading the same external file multiple
times is not time efficient. To handle this I wish to create a
global(across distributed system) Map variable which holds {vertex Id ,
partition Id} as a key value pair, and I want to populate this variable
from external file one time during a Giraph job run. I have tried several
ways to create & intialize such a global variable but the fact that global
variable will be populated for a Giraph job is very non deterministic (i.e
sometime the map is populated with value, sometimes not).

I think there might be some issue in how I am creating the Map variable and
initializing it to be invoked before My custom Partitioning logic calls it.
Can somebody please guide me the correct place to plugin this piece of
information to a Giraph job; and possibly a correct way of creating a
global variable with respect to Giraph distributed processing

Thanks & Regards,
Neha
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB