SepiaData

The main data container is SepiaData. It should contain all simulation data and observed data (if applicable). It also handles standardization and rescaling and creation of PCA and discrepancy bases (with interpolation to observed grid if needed).

The DataContainer class is used by SepiaData and not usually directly by users, but some of its attributes may be useful to access.

class sepia.SepiaData(x_sim=None, t_sim=None, y_sim=None, y_ind_sim=None, x_obs=None, y_obs=None, Sigy=None, y_ind_obs=None, x_cat_ind=None, t_cat_ind=None, xt_sim_sep=None)

Data object used for SepiaModel, containing potentially both sim_data and obs_data objects of type sepia.DataContainer.

Variables
  • x_sim (numpy.ndarray/NoneType) – controllable inputs/experimental conditions, shape (n, p) or None

  • t_sim (numpy.ndarray/NoneType) – non-controllable inputs, shape (n, q) or None

  • y_sim (numpy.ndarray) – simulation outputs, shape (n, ell_sim)

  • y_ind_sim (numpy.ndarray/NoneType) – indices for multivariate y, shape (ell_sim, ), required if ell_sim > 1

  • x_obs (numpy.ndarray/NoneType) – controllable inputs for observation data, shape (m, p) or None

  • y_obs (numpy.ndarray/list/NoneType) – observed outputs, shape (m, ell_obs), or list length m of 1D arrays (for ragged y_ind_obs), or None

  • y_ind_obs (numpy.ndarray/list/NoneType) – vector of indices for multivariate y, shape (l_obs, ), or list length m of 1D arrays (for ragged y_ind_obs), or None

  • sim_only (bool) – is it simulation-only data?

  • scalar_out (bool) – is the output y scalar?

  • ragged_obs (bool) – do the observations have ragged (non-shared) multivariate indices across instances?

  • x_cat_ind (numpy.ndarray/list) – indices of x that are categorical (0 = not cat, int > 0 = how many categories)

  • t_cat_ind (numpy.ndarray/list) – indices of t that are categorical (0 = not cat, int > 0 = how many categories)

  • xt_sim_sep (numpy.ndarray/list/NoneType) – for separable design, list of kronecker composable matrices

  • dummy_x (bool) – is there a dummy x? (used in problems where no x is provided)

  • sep_design (bool) – is there a Kronecker separable design?

Create SepiaData object. Many arguments are optional depending on the type of model. Users should instantiate with all data needed for the desired model. See documentation pages for more detail.

Parameters
  • x_sim (numpy.ndarray/NoneType) – controllable inputs/experimental conditions, shape (n, p), or None

  • t_sim (numpy.ndarray/NoneType) – non-controllable inputs, shape (n, q), or None

  • y_sim (numpy.ndarray) – simulation outputs, shape (n, ell_sim)

  • y_ind_sim (numpy.ndarray/NoneType) – indices for multivariate y, shape (ell_sim, ), required if ell_sim > 1

  • x_obs (numpy.ndarray/NoneType) – controllable inputs for observation data, shape (m, p) or None

  • y_obs (numpy.ndarray/list/NoneType) – observed outputs, shape (m, ell_obs), or list length m of 1D arrays (for ragged y_ind_obs), or None

  • y_ind_obs (numpy.ndarray/list/NoneType) – vector of indices for multivariate y, shape (l_obs, ), or list length m of 1D arrays (for ragged y_ind_obs), or None

  • Sigy (numpy.ndarray/NoneType) – optional observation covariance matrix (default is identity)

  • x_cat_ind (numpy.ndarray/list/NoneType) – indices of x that are categorical (0 = not cat, int > 0 = how many categories), or None

  • t_cat_ind (numpy.ndarray/list/NoneType) – indices of t that are categorical (0 = not cat, int > 0 = how many categories), or None

  • xt_sim_sep (numpy.ndarray/list/NoneType) – for separable design, list of kronecker composable matrices; it is a list of 2 or more design components that, through Kronecker expansion, produce the full input space (x and t) for the simulations.

Raises

TypeError if shapes not conformal or required data missing.

create_D_basis(D_type='constant', D_obs=None, D_sim=None, norm=True)

Create D_obs, D_sim discrepancy bases. Can specify a type of default basis (constant/linear) or provide matrices.

Parameters
  • D_type (string) – ‘constant’ or ‘linear’ to set up constant or linear D_sim and D_obs

  • D_obs (numpy.ndarray/list/NoneType) – a basis matrix on obs indices of shape (n_basis_elements, ell_obs), or list of matrices for ragged observations.

  • D_sim (numpy.ndarray/NoneType) – a basis matrix on sim indices of shape (n_basis_elements, sim_obs).

  • norm (bool) – normalize D basis?

Note

D_type parameter is ignored if D_obs and D_sim are provided.

create_K_basis(n_pc=0.995, K=None)

Creates K_sim and K_obs basis functions using PCA on sim_data.y_std, or using given K_sim matrix.

Parameters
  • n_pc (float/int) – proportion in [0, 1] of variance, or an integer number of components

  • K (numpy.ndarray/None) – a basis matrix on sim indices of shape (n_basis_elements, ell_sim) or None

Note

if standardize_y() method has not been called first, it will be called automatically by this method.

set_mean_basis(basis_type='linear')

Sets a mean basis (H) for a scalar respose model

Parameters

basis_type (str/None) – name of basis to be used

standardize_y(center=True, scale='scalar', y_mean=None, y_sd=None)

Standardizes both sim_data and obs_data outputs y based on sim_data.y mean/SD.

Parameters
  • center (bool) – subtract simulation mean (across observations)?

  • scale (string/bool) – how to rescale: ‘scalar’: single SD over all demeaned data, ‘columnwise’: SD for each column of demeaned data, False: no rescaling

  • y_mean (numpy.ndarray/float/NoneType) – y_mean for sim; optional, should match length of y_ind_sim or be scalar

  • y_sd (numpy.ndarray/float/NoneType) – y_sd for sim; optional, should match length of y_ind_sim or be scalar

transform_xt(x_notrans=None, t_notrans=None, x_range=None, t_range=None, x=None, t=None, native=False)

Transforms sim_data x and t and obs_data x to lie in [0, 1], columnwise, or applies same transformation to new x and t.

Parameters
  • x_notrans (list/NoneType) – column indices of x that should not be transformed or None

  • t_notrans (list/NoneType) – column indices of t that should not be transformed or None

  • x (numpy.ndarray/NoneType) – new x values to transform to [0, 1] using same rules as original x data or None

  • t (numpy.ndarray/NoneType) – new t values to transform to [0, 1] using same rules as original t data or None

  • x_range (numpy.ndarray/NoneType) – user specified data ranges, first row is min, second row is max for each variable

  • t_range (numpy.ndarray/NoneType) – user specified data ranges, first row is min, second row is max for each variable

  • native (bool) – boolean for reverse transformation on x,t from [0, 1] to native scale

Returns

tuple of x_trans, t_trans if x and t arguments provided; otherwise returns (None, None)

Note

A column is not transformed if min/max of the column values are equal, if the column is categorical, or if the user specifies no transformation using x_notrans or t_notrans arguments.

class sepia.DataContainer(x, y, t=None, y_ind=None, xt_sep_design=None, Sigy=None)

DataContainer serves to contain all data structures for a single data source (simulation or observation data).

Variables
  • x (numpy.ndarray/NoneType) – x values, controllable inputs/experimental variables, shape (n, p)

  • y (numpy.ndarray/NoneType) – y values, shape (n, ell)

  • t (numpy.ndarray/NoneType) – t values, non-controllable inputs, shape (n, q)

  • y_ind (numpy.ndarray/NoneType) – indices for multivariate y outputs, shape (ell, )

  • K (numpy.ndarray/list/NoneType) – PCA basis, shape (pu, ell), or list of K matrices for each observation (for ragged observations)

  • D (numpy.ndarray/list/NoneType) – discrepancy basis, shape (pv, ell), or list of D matrices (for ragged observations)

  • orig_y_sd (numpy.ndarray/float/NoneType) – standard deviation of original simulation y values (may be scalar or array, length ell)

  • orig_y_mean (numpy.ndarray/float/NoneType) – mean of original simulation y values (may be scalar or array, length ell)

  • y_std (numpy.ndarray/NoneType) – standardized y values, shape (n, ell)

  • x_trans (numpy.ndarray/NoneType) – x values transformed to unit hypercube, shape (n, p)

  • t_trans (numpy.ndarray/NoneType) – t values transformed to unit hypercube, shape (n, q)

  • orig_t_min (numpy.ndarray/NoneType) – minimum values (columnwise) of original t values

  • orig_t_max (numpy.ndarray/NoneType) – maximum values (columnwise) of original t values

  • orig_x_min (numpy.ndarray/NoneType) – minimum values (columnwise) of original x values

  • orig_x_max (numpy.ndarray/NoneType) – maximum values (columnwise) of original x values

  • xt_sep_design (list/NoneType) – list of separable design component matrices

Initialize DataContainer object.

Parameters
  • x (numpy.ndarray) – GP inputs (controllable/experimental conditions, would be known for both sim and obs), shape (n, p)

  • y (numpy.ndarray/list) – GP outputs, shape (n, ell), or list of 1D arrays for ragged observations

  • t (numpy.ndarray/NoneType) – optional GP inputs (not controllable, would be known only for sim), shape (n, q)

  • y_ind (numpy.ndarray/list/NoneType) – optional y indices (needed if ell > 1) or list of 1D arrays for ragged observations

  • sep_des (list/NoneType) – separable Kronecker design

Note

DataContainer objects are constructed when you instantiate SepiaData and generally won’t be instantiated directly.