Analysis
We usually are not interested in the valued of observables on each configuration, but wish to compute expectation values.
Thermalization
To get an unbiased estimate from a GrandCanonical ensemble that was generated as part of a MarkovChain we have to guarantee that the set of configurations which we account for in our expectation values do not remember the initial state the chain was started from, as that state is generated by some process that is not representative of the actual distribution of interest (in a 'hot' start, it is from the quenched distribution, for example).
Knowing how many configurations to cut is a judgement call, and you may be misled if the observables you consider thermalize very quickly; not every observable need thermalize at once. Moreover, one can never be completely confident that your samples are drawn from the basin of lowest action (in a global sense); perhaps the Markov chain simply has not reached a preferable basin on configurations.
To facilitate measuring only on configurations after a certain step in the Markov chain, the GrandCanonical ensemble provides the cut method, which returns another GrandCanonical ensemble without the leading configurations.
Handling Autocorrelation
For a GrandCanonical ensemble that was generated as part of a MarkovChain there are correlations from one configuration to the next.
This introduces autocorrelation into the observables time series; a naive expectation value that does not account for the autocorrelation will produce underestimated uncertainties.
Good algorithms for estimating the autocorrelation time are known [16].
Some simple ways of decreasing autocorrelation are to decimate your Markov Chain, only keeping every nᵗʰ configuration. The GrandCanonical ensemble provides the every method, which returns another GrandCanonical ensemble keeping configurations evenly spaced by n.
Another strategy is to bin: average observable values over consecutive configurations. Because bin is a python built-in we avoid methods with that name.
- class tdg.analysis.Binning(ensemble, width)[source]
Bases:
objectA binning is built from an ensemble, on which observables are computed, and a width, over which observables are averaged.
If the width does not evenly divide the length of the ensemble, some configurations are dropped from the front of the ensemble.
For samples with weights \(w\), the new weights are given by the mean weight of the bin, while the new observable value is given by the weighted mean.
- Parameters
ensemble (
GrandCanonical) – The ensemble from which sample observables are drawn.width (int) – The strides over which to average observables.
Any observable that
ensemblesupports can be called from theBinning.Binninguses__getattr__trickery under the hood to intercept calls and perform the average transparently.- Ensemble
The ensemble underlying the binning.
- Action
The action underlying the binning.
- width
The width over which to average
- drop
How many configurations are dropped from the start of the ensemble.
- bins
How many bins are in the binning.
- weights
The weight of each bin.
The Bootstrap
Bootstrap resampling, bootstrapping, or “the bootstrap” is a resampling method used for uncertainty estimation. One draws, with replacement, from a sample.
- class tdg.analysis.Bootstrap(ensemble, draws=100)[source]
The bootstrap is a resampling technique for estimating uncertainties.
For samples with weights \(w\) the expectation value of an observable is
\[\left\langle O \right\rangle = \frac{\left\langle O w \right\rangle}{\left\langle w \right\rangle}\]and an accurate bootstrap estimate of the left-hand side requires tracking the correlations between the numerator and denominator. Moreover, quoting correlated uncertainties requires resampling different observables in the same way.
- Parameters
ensemble (
GrandCanonicalorBinned) – The ensemble to resample.draws (int) – The number of times to resample.
Any observable that
ensemblesupports can be called from theBootstrap.Bootstrapuses__getattr__trickery under the hood to intercept calls and perform the weighted average transparently.Each observable returns a
torch.tensorof the same dimension as the ensemble’s observable. However, rather than configurations first,drawsare first.Each draw is a weighted average over the resampled weight, as shown above, and is therefore an estimator for the expectation value. These are guaranteed (by the central limit theorem) to be normally distributed. To get an uncertainty estimate one need only take the
.mean()for a central value and.std()for the uncertainty on the mean.- measure(*observables)[source]
Compute each @observable and @derived quantity in observables; log an error for any unregistered observable or derived quantity.
- Parameters
observables (strings) – Names of observables or derived quantities.
- Return type
Bootstrap; itself, now with some observables and derived quantities bootstrapped.
Note
If no observables are passed, bootstraps every registered @observable and @derived quantity.