Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Is it safe to have static methods in Hadoop Framework


Copy link to this message
-
RE: Is it safe to have static methods in Hadoop Framework


The keyword "static" in java means that a single instance of it will
exist for a given class loader. Two different class loaders will have
different values for a static variable even within the same JVM running
on the same host.

Synchronization in Java works based on locks. In the case of
synchronized keyword applied to static methods, the lock would be the
class. Same rules apply across multiple class loaders as above.

The only time you would need to synchronize something is if it contains
shared state and it must be updated in an atomic manner. This isn't
going to work in any parallel process unless you first have a shared
data structure. Static only guarantees that it will be shared within the
same class loader (again see above).

A static method is fine if there is no shared state (i.e. if it's just a
function that takes parameters and returns a value). If you need to
share state, I would look at writing to HDFS or using an ACID compliant
data store with transaction semantics (e.g. a relational database).

You might want to check out this:

https://en.wikipedia.org/wiki/Functional_programming

I would try to avoid shared state unless it's absolutely necessary.
-------- Original Message --------
Subject: Is it safe to have static methods in Hadoop Framework
From: Huy Pham <[EMAIL PROTECTED]>
Date: Thu, July 25, 2013 2:46 pm
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>,
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]>

 Hi All,
   I am writing a class (called Parser) with a couple of static
functions because I don't want millions of instances of this class to be
created during the run.
   However, I realized that Hadoop will eventually produce parallel
jobs, and if all jobs will call static functions of this Parser class,
would that be safe?
   In other words, will all hadoop jobs share the same class Parser or
will each of them have their own Parser? In the former case, if all jobs
share the same class, then if I make the methods synchronized, then the
jobs would need to wait until the locks to the functions are released,
thus that would affect the performance. However, in later case, that
would not cause any problem.
Can someone provide some insights?
Thanks
Huy
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB