public class EmpiricalDistribution extends Object implements Serializable
An EmpiricalDistribution
maintains data structures, called
distribution digests, that describe empirical distributions and
support the following operations:
EmpiricalDistribution
to build grouped
frequency histograms representing the input data or to generate random values
"like" those in the input file -- i.e., the values generated will follow the
distribution of the values in the file.
The implementation uses what amounts to the Variable Kernel Method with Gaussian smoothing:
Digesting the input file
binCount
"bins."USAGE NOTES:
binCount
is set by default to 1000. A good rule of thumb
is to set the bin count to approximately the length of the input file divided
by 10. Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_BIN_COUNT
Default bin count
|
Constructor and Description |
---|
EmpiricalDistribution()
Creates a new EmpiricalDistribution with the default bin count.
|
EmpiricalDistribution(int binCount)
Creates a new EmpiricalDistribution with the specified bin count.
|
EmpiricalDistribution(int binCount,
RandomDataImpl randomData)
Creates a new EmpiricalDistribution with the specified bin count using the
provided
RandomDataImpl instance as the source of random data. |
EmpiricalDistribution(int binCount,
RandomGenerator generator)
Creates a new EmpiricalDistribution with the specified bin count using the
provided
RandomGenerator as the source of random data. |
EmpiricalDistribution(RandomDataImpl randomData)
Creates a new EmpiricalDistribution with default bin count using the
provided
RandomDataImpl as the source of random data. |
EmpiricalDistribution(RandomGenerator generator)
Creates a new EmpiricalDistribution with default bin count using the
provided
RandomGenerator as the source of random data. |
Modifier and Type | Method and Description |
---|---|
int |
getBinCount()
Returns the number of bins.
|
List<SummaryStatistics> |
getBinStats()
Returns a List of
SummaryStatistics instances containing
statistics describing the values in each of the bins. |
double[] |
getGeneratorUpperBounds()
Returns a fresh copy of the array of upper bounds of the subintervals
of [0,1] used in generating data from the empirical distribution.
|
double |
getNextValue()
Generates a random value from this distribution.
|
StatisticalSummary |
getSampleStats()
Returns a
StatisticalSummary describing this distribution. |
double[] |
getUpperBounds()
Returns a fresh copy of the array of upper bounds for the bins.
|
boolean |
isLoaded()
Property indicating whether or not the distribution has been loaded.
|
void |
load(double[] in)
Computes the empirical distribution from the provided
array of numbers.
|
void |
load(File file)
Computes the empirical distribution from the input file.
|
void |
load(URL url)
Computes the empirical distribution using data read from a URL.
|
void |
reSeed(long seed)
Reseeds the random number generator used by
getNextValue() . |
public static final int DEFAULT_BIN_COUNT
public EmpiricalDistribution()
public EmpiricalDistribution(int binCount)
binCount
- number of binspublic EmpiricalDistribution(int binCount, RandomGenerator generator)
RandomGenerator
as the source of random data.binCount
- number of binsgenerator
- random data generator (may be null, resulting in default JDK generator)public EmpiricalDistribution(RandomGenerator generator)
RandomGenerator
as the source of random data.generator
- random data generator (may be null, resulting in default JDK generator)public EmpiricalDistribution(int binCount, RandomDataImpl randomData)
RandomDataImpl
instance as the source of random data.binCount
- number of binsrandomData
- random data generator (may be null, resulting in default JDK generator)public EmpiricalDistribution(RandomDataImpl randomData)
RandomDataImpl
as the source of random data.randomData
- random data generator (may be null, resulting in default JDK generator)public void load(double[] in) throws NullArgumentException
in
- the input data arrayNullArgumentException
- if in is nullpublic void load(URL url) throws IOException, NullArgumentException
url
- url of the input fileIOException
- if an IO error occursNullArgumentException
- if url is nullpublic void load(File file) throws IOException, NullArgumentException
file
- the input fileIOException
- if an IO error occursNullArgumentException
- if file is nullpublic double getNextValue() throws MathIllegalStateException
MathIllegalStateException
- if the distribution has not been loadedpublic StatisticalSummary getSampleStats()
StatisticalSummary
describing this distribution.
Preconditions:IllegalStateException
- if the distribution has not been loadedpublic int getBinCount()
public List<SummaryStatistics> getBinStats()
SummaryStatistics
instances containing
statistics describing the values in each of the bins. The list is
indexed on the bin number.public double[] getUpperBounds()
Returns a fresh copy of the array of upper bounds for the bins.
Bins are:
[min,upperBounds[0]],(upperBounds[0],upperBounds[1]],...,
(upperBounds[binCount-2], upperBounds[binCount-1] = max].
Note: In versions 1.0-2.0 of commons-math, this method
incorrectly returned the array of probability generator upper
bounds now returned by getGeneratorUpperBounds()
.
public double[] getGeneratorUpperBounds()
Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the empirical distribution. Subintervals correspond to bins with lengths proportional to bin counts.
In versions 1.0-2.0 of commons-math, this array was (incorrectly) returned
by getUpperBounds()
.
public boolean isLoaded()
public void reSeed(long seed)
getNextValue()
.seed
- random generator seedCopyright © 2003-2012 Apache Software Foundation. All Rights Reserved.