Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> A question for txid


Copy link to this message
-
Re: A question for txid
Thanks Harsh,Todd.

After 200 million years, spacemen manage the earth, they also know Hadoop,
but they cannot restart it, after a hard debug they find the txid has been
overflowed for many years.

--Send from my Sony mobile.
On Jun 25, 2013 10:52 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:

> I did some back of the envelope math when implementing txids, and
> determined that overflow is not ever going to happen... A "busy" namenode
> does 1000 write transactions/second (2^10). MAX_LONG is 2^63. So, we can
> run for 2^63 seconds. A year is about 2^25 seconds. So, at 1k tps, you can
> run your namenode for 2^(63-10-25) = 268 million years.
>
> Hadoop is great software and I'm sure it will be around for years to come,
> but if it's still running in 268 million years, that will be a pretty
> depressing rate of technological progress!
>
> -Todd
>
> On Tue, Jun 25, 2013 at 6:14 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>
> > Yes, it logically can if there have been as many transactions (its a
> > very very large number to reach though).
> >
> > Long.MAX_VALUE is (2^63 - 1) or 9223372036854775807.
> >
> > I hacked up my local NN's txids manually to go very large (close to
> > max) and decided to try out if this causes any harm. I basically
> > bumped up the freshly formatted starting txid to 9223372036854775805
> > (and ensured image references the same):
> >
> > ➜  current  ls
> > VERSION
> > fsimage_9223372036854775805.md5
> > fsimage_9223372036854775805
> > seen_txid
> > ➜  current  cat seen_txid
> > 9223372036854775805
> >
> > NameNode started up as expected.
> >
> > 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded
> > in 0 seconds.
> > 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid
> > 9223372036854775805 from
> > /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
> > 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at
> > 9223372036854775806
> >
> > I could create a bunch of files and do regular ops (counting to much
> > after the long max increments). I created over 100 files, just to make
> > it go well over the Long.MAX_VALUE.
> >
> > Quitting NameNode and restarting fails though, with the following error:
> >
> > 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
> > java.io.IOException: Gap in transactions. Expected to be able to read
> > up until at least txid 9223372036854775806 but unable to find any edit
> > logs containing txid -9223372036854775808
> >
> > So it looks like it cannot currently handle an overflow.
> >
> > I've filed https://issues.apache.org/jira/browse/HDFS-4936 to discuss
> > this. I don't think this is of immediate concern though, so we should
> > be able to address it in future (unless there's parts of the code
> > which already are preventing reaching this number in the first place -
> > please do correct me if there is such a part).
> >
> > On Tue, Jun 25, 2013 at 3:09 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
> > > Hi dear All,
> > >
> > > It's long type for the txid currently,
> > >
> > > FSImage.java:
> > >
> > > boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
> > >     throws IOException{
> > >
> > >   editLog.setNextTxId(lastAppliedTxId + 1L);
> > > }
> > >
> > > Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?
> >
> >
> >
> > --
> > Harsh J
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB