Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # user >> Drill synthetic log generator


Copy link to this message
-
Re: Drill synthetic log generator
Peter,

There is no command line parameter.

In LogGenerator, this line controls how users are invented:

  private LongTail<User> userGenerator = new LongTail<User>(50000, 0) {
        @Override
        protected User createThing() {
            return new User(ipGenerator.sample(), geo, terms);
        }
    };

The two parameters here (50000 and 0) control how the number of users
grows.  The first number is called alpha and the second is the discount.
 When discount == 0 as in this code, the users are generated using a
Dirichlet process and the number of unique users grows at roughly alpha
log(n).  If discount > 0, then the percentage of users with a single
transaction is asymptotically equal to the discount.  The user population
grows roughly with alpha n^discount.

There will be a real problem if the number of users increases, however,
because each user requires a lot of memory.  This happens because the
language model for each user is cloned from a common base instead of
sharing this common base.  I have been looking into using a better kind of
hash table to allow sharing of mutable tables (using an HAMT, actually),
but this definitely isn't ready.  Once (if ever) it is ready, we should see
at least one and possibly 3 orders of magnitude decrease in the memory cost
of each user after the first few.

This all means that the simplest and safest thing to do is increase the
value of alpha from 50,000 and watch your memory usage.
On Fri, Jul 12, 2013 at 4:42 PM, peter he <[EMAIL PROTECTED]> wrote:

> ...
>
> One quick followup question, is there anyway to change the number of users
> generated using a parameter?
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB