Helpful Code Snippets

Unlike notebooks, these are not self-contained examples, but are meant to be a quick reference for specific tasks. These are not necessarily exhaustive examples; see full class documentation for all possible arguments and options. For a more full description of the SEPIA work flow, see General workflow guide.

SepiaData inputs

SepiaData operations

Model setup

Customize and run MCMC

Extract MCMC samples

Make predictions

Hierarchical or shared theta models

SepiaData inputs

SepiaData objects are used to hold various types of data inputs to a model. The types and sizes of each input to SepiaData helps determine which kind of model is to be set up.

A SEPIA model may contain only simulation data (an emulator-only model) or both simulation and observed data.

It is always good to call print(data) on your SepiaData object to verify your setup is as intended.

Examples of data setup for different kinds of SEPIA models (see SepiaData for fuller explanation of inputs):

Univariate-output emulator-only data

data = SepiaData(t_sim=t, y_sim=y)           # No controllable input
data = SepiaData(x_sim=x, y_sim=y)           # Only controllable input
data = SepiaData(x_sim=x, t_sim=t, y_sim=y)  # Controllable and other inputs
# Indicate that third column of x is categorical with 5 categories (takes values in [1, 2, 3, 4, 5])
data = SepiaData(x_sim=x, y_sim=y, x_cat_ind=[0, 0, 5])

Multivariate-output emulator-only data

data = SepiaData(t_sim=t, y_sim=y, y_ind_sim=y_ind)           # No controllable input
data = SepiaData(x_sim=x, y_sim=y, y_ind_sim=y_ind)           # Only controllable input
data = SepiaData(x_sim=x, t_sim=t, y_sim=y, y_ind_sim=y_ind)  # Controllable and other inputs

Univariate-output simulation and observed data

data = SepiaData(t_sim=t, y_sim=y, y_obs=y_obs)                        # No controllable input
data = SepiaData(x_sim=x, t_sim=t, y_sim=y, x_obs=x_obs, y_obs=y_obs)  # Controllable and other inputs

Multivariate-output simulation and observed data

data = SepiaData(t_sim=t, y_sim=y, y_ind_sim=y_ind,
                 y_obs=y_obs, y_ind_obs=y_ind_obs)               # No controllable input
data = SepiaData(x_sim=x, t_sim=t, y_sim=y, y_ind_sim=y_ind,
                 x_obs=x_obs, y_obs=y_obs, y_ind_obs=y_ind_obs)  # Controllable and other inputs

Multivariate-output simulation and observed data with ragged observations

Ragged observations means the observed data indices vary across observation data instances. In this case, y_obs and y_ind_obs are now lists instead of numpy arrays:

y_obs = [np.array([[0.3, 0.5, 0.7]]), np.array([[0.1, 0.4, 0.6, 0.9]])
y_ind_obs = [np.array([1, 2, 3]), np.array([0.5, 2.5, 4, 6])]
data = SepiaData(t_sim=t, y_sim=y, y_ind_sim=y_ind, y_obs=y_obs, y_ind_obs=y_ind_obs) # No controllable input

SepiaData operations

Regardless of the inputs given to SepiaData, there are a few key methods which generally should be called before setting up the model.

Transformations

First, we want to transform x and t inputs to the unit hypercube:

data.transform_xt()

Next, we want to standardize the y outputs:

data.standardize_y()

Basis functions

If the outputs are multivariate, we want to set up a principal component (PC) basis and optionally, a discrepancy basis:

# PC basis
data.create_K_basis(n_pc=5)     # With 5 PCs
data.create_K_basis(n_pc=0.99)  # Enough PCs for at least 99 pct variance explained

# Discrepancy basis -- optional
data.create_D_basis(D_type='linear')  # Default linear discrepancy
data.create_D_basis(D=D)              # Pass in custom D basis

Set up model

Once the data structure is set up correctly, the inputs are in the unit hypercube, the outputs are standardized, and basis vectors are created (for multivariate output), we are ready to set up the Sepia model:

model = SepiaModel(data)

The model parses the SepiaData structure to understand what kind of model is being set up and does a lot of precomputation of various quantities to prepare for likelihood evaluations. It also sets up default priors, MCMC step types and step sizes, and default starting values for MCMC.

To see information about the default setup, you can use:

model.print_prior_info()
model.print_value_info()
model.print_mcmc_info()

Customize and run MCMC

After instantiating the SepiaModel object, various aspects of the MCMC can be customized prior to doing MCMC.

Custom start values

Each parameter in the model has attribute val which holds the start values (or, during MCMC, the current values). Prior to running MCMC, these can be directly modified using the set_val method:

# Single scalar applies to all thetas
model.params.theta.set_val(0.7)
# Or pass an array of shape model.params.theta.val_shape
model.params.theta.set_val(np.array([[0.7, 0.5, 0.1]]))

Fixing subsets of parameters

It may sometimes be desirable to fix the values of certain parameters. The fixed attribute of SepiaParam is a boolean array of size val_shape (all False by default):

model.params.lamWOs.fixed = np.array([[True, False]])

Change prior distribution and prior parameters

Currently, there are only four distributions supported for priors: Normal, Gamma, Beta, and Uniform. After instantiating the SepiaModel, we can modify priors as follows:

prior_dist_name = 'Normal'
prior_mu = 0.1
prior_sd = 2.0
prior_bounds = [0, 1]
model.params.theta.prior = SepiaPrior(model.params.theta, dist=prior_dist_name, params=[prior_mu, prior_sd],
                                      bounds=prior_bounds)

Change MCMC step sizes or step types

You can manually change MCMC step types or step sizes:

model.params.theta.mcmc.stepType = 'Uniform'
model.params.theta.mcmc.stepParam = np.array([[0.5, 0.1, 0.3]])

Automatic MCMC step size tuning

Automatic step size tuning based on YADAS:

model.tune_step_sizes(n_burn, n_levels)

Run MCMC or add more samples

To do MCMC sampling, call do_mcmc:

model.do_mcmc(500)

To append more samples to the current samples, you can call it again:

model.do_mcmc(500) # Now has 1000 total samples

Saving MCMC chains or model info

We will build some functions to handle this more smoothly, but for now you could do something like:

import pickle
for chunk in range(10):
    model.do_mcmc(500)
    with open('samples%d.pkl' % chunk, 'wb') as f:
        pickle.dump(model.get_samples(numsamples=500), f)

See also SepiaModel.save_model_info() and SepiaModel.restore_model_info().

Extract MCMC samples

To extract MCMC samples to a dictionary format:

# Select a fixed set of samples
model.get_samples(nburn=0, sampleset=np.arange(100), flat=True, includelogpost=True)

# Select a fixed number of samples
model.get_samples(nburn=0, numsamples=200, flat=True, includelogpost=True)

# Discarding nburn samples
model.get_samples(nburn=50, numsamples=200, flat=True, includelogpost=True)

# Keep samples in array format rather than flattening along parameter dimensions
model.get_samples(nburn=50, numsamples=200, flat=False)

# Returns only a set of "effective samples" determined by effective sample size
samples = model.get_samples(effectivesamples=True)

MCMC diagnostics

Several graphical diagnostics are available:

plot_acf(model, nlags=30) # Autocorrelation function and effective sample size calculation
mcmc_trace(samples)       # Trace plots
theta_pairs(samples)      # Pairs plots of theta variables
rho_box_plots(model)      # Box plots of GP lengthscale parameters
param_stats(samples)      # Summary statistics of parameters

Each returns a matplotlib figure object that you can save using plt.savefig() or show using plt.show().

Make predictions

To make predictions, use the SepiaPredict class. There are different types of predictions, and predictions can be made in the model space (w, u, v) or output space (y), with or without standardization.

Emulator predictions

To predict from the emulator (eta portion of model), first set up the SepiaEmulatorPrediction object:

# Provide input settings to predict at
x_pred = np.linspace(0,1,9).reshape((9,1))
t_pred = np.tile(np.array([1,0,1]).reshape(1,3),(9,1))
pred_samples = model.get_samples(numsamples=10)
pred = SepiaEmulatorPrediction(x_pred=x_pred, samples=pred_samples, model=model, t_pred=t_pred)

To get w:

predw = pred.get_w()

To get y on the standardized scale:

predystd = pred.get_y(std=True)

To get y on the native (original) scale:

predystd = pred.get_y()

Full model predictions

To predict from full model (including observation noise, and discrepancy, if applicable):

x_pred = np.linspace(0,1,9).reshape(9,1)
pred_samples = model.get_samples(numsamples=7)
pred = SepiaFullPrediction(x_pred, pred_samples, model)

To get u, v:

u_pred, v_pred = pred.get_u_v()

To get discrepancy:

preddstd = pred.get_discrepancy(std=True) # Standardized scale
predd = pred.get_discrepancy()            # Native/original scale

To get simulated y:

predysimstd = pred.get_ysim(std=True) # Standardized scale
predysim = pred.get_ysim()            # Native/original scale

To get y (simulator+discrepancy):

predy = pred.get_yobs()                            # Native/original scale
predystd = pred.get_yobs(std=True)                 # Standardized scale
predystdobs = pred.get_yobs(std=True, as_obs=True) # Standardized scale at only observed data locations x_obs

Cross-validation

By default, leave-one-out cross validation is done on the emulator model:

pred_samples = model.get_samples(numsamples=10)
CV_pred = SepiaXvalEmulatorPrediction(samples=pred_samples, model=model)

CV_pred_y = CV_pred.get_y()

You can also provide custom sets of indices to leave out in turn, such as leaving out chunks of 10 examples at a time, and you can add residual variance to the predictions:

leave_out_inds = np.array_split(np.arange(m), 5)
pred_samples = model.get_samples(numsamples=7)
CV_pred = SepiaXvalEmulatorPrediction(samples=pred_samples, model=model, leave_out_inds=leave_out_inds, addResidVar=True)

Sensitivity analysis

Run sensitivity analysis:

model.do_mcmc(1000)
samples = model.get_samples(20)
sens = sensitivity(model, samples)

Hierarchical or shared theta models

The syntax for both cases is similar. First, set up each model, then put them in a list:

m1 = SepiaModel(d1)
m2 = SepiaModel(d2)
m3 = SepiaModel(d3)
model_list = [m1, m2, m3]

Then we need to specify which thetas are shared or modeled hierarchically. The way to do this is with a numpy array of size (j, n_models) where each row represents one of the shared/hierarchical theta variables, and each column gives the index of the shared/hierarchical theta in the respective model. For instance:

theta_inds = np.array([[0, 0, 0], [1, 1, 2], [-1, 2, 3]])

This means that the first shared/hierarchical theta is theta_0 in model 1, theta_0 in model 2, and theta_0 in model 3. The second shared/hierarchical theta is theta_1 in model 1, theta_1 in model 2, and theta_2 in model 3. The third shared/hierarchical theta is not in model 1, is theta_2 in model 2, and is theta_3 in model 3. The index -1 is used to indicate that a particular shared/hierarchical theta is not in a particular model.

Then the model setup is:

shared_model = SepiaSharedThetaModels(model_list, theta_inds)     # Shared version
hier_model = SepiaHierarchicalThetaModels(model_list, theta_inds) # Hierarchical version

MCMC is done similarly to regular models:

shared_model.do_mcmc()

Step size tuning is not supported on shared or hierarchical theta models.