Analysis

We usually are not interested in the valued of observables on each configuration, but wish to compute expectation values.

Thermalization

To get an unbiased estimate from a GrandCanonical ensemble that was generated as part of a MarkovChain we have to guarantee that the set of configurations which we account for in our expectation values do not remember the initial state the chain was started from, as that state is generated by some process that is not representative of the actual distribution of interest (in a 'hot' start, it is from the quenched distribution, for example).

Knowing how many configurations to cut is a judgement call, and you may be misled if the observables you consider thermalize very quickly; not every observable need thermalize at once. Moreover, one can never be completely confident that your samples are drawn from the basin of lowest action (in a global sense); perhaps the Markov chain simply has not reached a preferable basin on configurations.

To facilitate measuring only on configurations after a certain step in the Markov chain, the GrandCanonical ensemble provides the cut method, which returns another GrandCanonical ensemble without the leading configurations.

Handling Autocorrelation

For a GrandCanonical ensemble that was generated as part of a MarkovChain there are correlations from one configuration to the next. This introduces autocorrelation into the observables time series; a naive expectation value that does not account for the autocorrelation will produce underestimated uncertainties. Good algorithms for estimating the autocorrelation time are known [16].

Some simple ways of decreasing autocorrelation are to decimate your Markov Chain, only keeping every nᵗʰ configuration. The GrandCanonical ensemble provides the every method, which returns another GrandCanonical ensemble keeping configurations evenly spaced by n.

Another strategy is to bin: average observable values over consecutive configurations. Because bin is a python built-in we avoid methods with that name.

class tdg.analysis.Binning(ensemble, width)[source]

Bases: object

A binning is built from an ensemble, on which observables are computed, and a width, over which observables are averaged.

If the width does not evenly divide the length of the ensemble, some configurations are dropped from the front of the ensemble.

For samples with weights \(w\), the new weights are given by the mean weight of the bin, while the new observable value is given by the weighted mean.

Parameters

ensemble (GrandCanonical) – The ensemble from which sample observables are drawn.
width (int) – The strides over which to average observables.

Any observable that ensemble supports can be called from the Binning. Binning uses __getattr__ trickery under the hood to intercept calls and perform the average transparently.

Ensemble: The ensemble underlying the binning.

Action: The action underlying the binning.

width: The width over which to average

drop: How many configurations are dropped from the start of the ensemble.

bins: How many bins are in the binning.

weights: The weight of each bin.

bootstrapped(draws=100)[source]

Parameters: draws (int) – Resamples for uncertainty estimation; see Bootstrap for details.
Return type: Bootstrap built from the ensemble, with the draws specified.

The Bootstrap

Bootstrap resampling, bootstrapping, or “the bootstrap” is a resampling method used for uncertainty estimation. One draws, with replacement, from a sample.

class tdg.analysis.Bootstrap(ensemble, draws=100)[source]

The bootstrap is a resampling technique for estimating uncertainties.

For samples with weights \(w\) the expectation value of an observable is

\[\left\langle O \right\rangle = \frac{\left\langle O w \right\rangle}{\left\langle w \right\rangle}\]

and an accurate bootstrap estimate of the left-hand side requires tracking the correlations between the numerator and denominator. Moreover, quoting correlated uncertainties requires resampling different observables in the same way.

Parameters

ensemble (GrandCanonical or Binned) – The ensemble to resample.
draws (int) – The number of times to resample.

Any observable that ensemble supports can be called from the Bootstrap. Bootstrap uses __getattr__ trickery under the hood to intercept calls and perform the weighted average transparently.

Each observable returns a torch.tensor of the same dimension as the ensemble’s observable. However, rather than configurations first, draws are first.

Each draw is a weighted average over the resampled weight, as shown above, and is therefore an estimator for the expectation value. These are guaranteed (by the central limit theorem) to be normally distributed. To get an uncertainty estimate one need only take the .mean() for a central value and .std() for the uncertainty on the mean.

measure(*observables)[source]

Compute each @observable and @derived quantity in observables; log an error for any unregistered observable or derived quantity.

Parameters: observables (strings) – Names of observables or derived quantities.
Return type: Bootstrap; itself, now with some observables and derived quantities bootstrapped.

Note

If no observables are passed, bootstraps every registered @observable and @derived quantity.