SepiaData¶
The main data container is SepiaData. It should contain all simulation data and observed data (if applicable). It also handles standardization and rescaling and creation of PCA and discrepancy bases (with interpolation to observed grid if needed).
The DataContainer class is used by SepiaData and not usually directly by users, but some of its attributes may be useful to access.
- class sepia.SepiaData(x_sim=None, t_sim=None, y_sim=None, y_ind_sim=None, x_obs=None, y_obs=None, Sigy=None, y_ind_obs=None, x_cat_ind=None, t_cat_ind=None, xt_sim_sep=None)¶
Data object used for SepiaModel, containing potentially both sim_data and obs_data objects of type sepia.DataContainer.
- Variables
x_sim (numpy.ndarray/NoneType) – controllable inputs/experimental conditions, shape (n, p) or None
t_sim (numpy.ndarray/NoneType) – non-controllable inputs, shape (n, q) or None
y_sim (numpy.ndarray) – simulation outputs, shape (n, ell_sim)
y_ind_sim (numpy.ndarray/NoneType) – indices for multivariate y, shape (ell_sim, ), required if ell_sim > 1
x_obs (numpy.ndarray/NoneType) – controllable inputs for observation data, shape (m, p) or None
y_obs (numpy.ndarray/list/NoneType) – observed outputs, shape (m, ell_obs), or list length m of 1D arrays (for ragged y_ind_obs), or None
y_ind_obs (numpy.ndarray/list/NoneType) – vector of indices for multivariate y, shape (l_obs, ), or list length m of 1D arrays (for ragged y_ind_obs), or None
sim_only (bool) – is it simulation-only data?
scalar_out (bool) – is the output y scalar?
ragged_obs (bool) – do the observations have ragged (non-shared) multivariate indices across instances?
x_cat_ind (numpy.ndarray/list) – indices of x that are categorical (0 = not cat, int > 0 = how many categories)
t_cat_ind (numpy.ndarray/list) – indices of t that are categorical (0 = not cat, int > 0 = how many categories)
xt_sim_sep (numpy.ndarray/list/NoneType) – for separable design, list of kronecker composable matrices
dummy_x (bool) – is there a dummy x? (used in problems where no x is provided)
sep_design (bool) – is there a Kronecker separable design?
Create SepiaData object. Many arguments are optional depending on the type of model. Users should instantiate with all data needed for the desired model. See documentation pages for more detail.
- Parameters
x_sim (numpy.ndarray/NoneType) – controllable inputs/experimental conditions, shape (n, p), or None
t_sim (numpy.ndarray/NoneType) – non-controllable inputs, shape (n, q), or None
y_sim (numpy.ndarray) – simulation outputs, shape (n, ell_sim)
y_ind_sim (numpy.ndarray/NoneType) – indices for multivariate y, shape (ell_sim, ), required if ell_sim > 1
x_obs (numpy.ndarray/NoneType) – controllable inputs for observation data, shape (m, p) or None
y_obs (numpy.ndarray/list/NoneType) – observed outputs, shape (m, ell_obs), or list length m of 1D arrays (for ragged y_ind_obs), or None
y_ind_obs (numpy.ndarray/list/NoneType) – vector of indices for multivariate y, shape (l_obs, ), or list length m of 1D arrays (for ragged y_ind_obs), or None
Sigy (numpy.ndarray/NoneType) – optional observation covariance matrix (default is identity)
x_cat_ind (numpy.ndarray/list/NoneType) – indices of x that are categorical (0 = not cat, int > 0 = how many categories), or None
t_cat_ind (numpy.ndarray/list/NoneType) – indices of t that are categorical (0 = not cat, int > 0 = how many categories), or None
xt_sim_sep (numpy.ndarray/list/NoneType) – for separable design, list of kronecker composable matrices; it is a list of 2 or more design components that, through Kronecker expansion, produce the full input space (x and t) for the simulations.
- Raises
TypeError if shapes not conformal or required data missing.
- create_D_basis(D_type='constant', D_obs=None, D_sim=None, norm=True)¶
Create D_obs, D_sim discrepancy bases. Can specify a type of default basis (constant/linear) or provide matrices.
- Parameters
D_type (string) – ‘constant’ or ‘linear’ to set up constant or linear D_sim and D_obs
D_obs (numpy.ndarray/list/NoneType) – a basis matrix on obs indices of shape (n_basis_elements, ell_obs), or list of matrices for ragged observations.
D_sim (numpy.ndarray/NoneType) – a basis matrix on sim indices of shape (n_basis_elements, sim_obs).
norm (bool) – normalize D basis?
Note
D_type parameter is ignored if D_obs and D_sim are provided.
- create_K_basis(n_pc=0.995, K=None)¶
Creates K_sim and K_obs basis functions using PCA on sim_data.y_std, or using given K_sim matrix.
- Parameters
n_pc (float/int) – proportion in [0, 1] of variance, or an integer number of components
K (numpy.ndarray/None) – a basis matrix on sim indices of shape (n_basis_elements, ell_sim) or None
Note
if standardize_y() method has not been called first, it will be called automatically by this method.
- set_mean_basis(basis_type='linear')¶
Sets a mean basis (H) for a scalar respose model
- Parameters
basis_type (str/None) – name of basis to be used
- standardize_y(center=True, scale='scalar', y_mean=None, y_sd=None)¶
Standardizes both sim_data and obs_data outputs y based on sim_data.y mean/SD.
- Parameters
center (bool) – subtract simulation mean (across observations)?
scale (string/bool) – how to rescale: ‘scalar’: single SD over all demeaned data, ‘columnwise’: SD for each column of demeaned data, False: no rescaling
y_mean (numpy.ndarray/float/NoneType) – y_mean for sim; optional, should match length of y_ind_sim or be scalar
y_sd (numpy.ndarray/float/NoneType) – y_sd for sim; optional, should match length of y_ind_sim or be scalar
- transform_xt(x_notrans=None, t_notrans=None, x_range=None, t_range=None, x=None, t=None, native=False)¶
Transforms sim_data x and t and obs_data x to lie in [0, 1], columnwise, or applies same transformation to new x and t.
- Parameters
x_notrans (list/NoneType) – column indices of x that should not be transformed or None
t_notrans (list/NoneType) – column indices of t that should not be transformed or None
x (numpy.ndarray/NoneType) – new x values to transform to [0, 1] using same rules as original x data or None
t (numpy.ndarray/NoneType) – new t values to transform to [0, 1] using same rules as original t data or None
x_range (numpy.ndarray/NoneType) – user specified data ranges, first row is min, second row is max for each variable
t_range (numpy.ndarray/NoneType) – user specified data ranges, first row is min, second row is max for each variable
native (bool) – boolean for reverse transformation on x,t from [0, 1] to native scale
- Returns
tuple of x_trans, t_trans if x and t arguments provided; otherwise returns (None, None)
Note
A column is not transformed if min/max of the column values are equal, if the column is categorical, or if the user specifies no transformation using x_notrans or t_notrans arguments.
- class sepia.DataContainer(x, y, t=None, y_ind=None, xt_sep_design=None, Sigy=None)¶
DataContainer serves to contain all data structures for a single data source (simulation or observation data).
- Variables
x (numpy.ndarray/NoneType) – x values, controllable inputs/experimental variables, shape (n, p)
y (numpy.ndarray/NoneType) – y values, shape (n, ell)
t (numpy.ndarray/NoneType) – t values, non-controllable inputs, shape (n, q)
y_ind (numpy.ndarray/NoneType) – indices for multivariate y outputs, shape (ell, )
K (numpy.ndarray/list/NoneType) – PCA basis, shape (pu, ell), or list of K matrices for each observation (for ragged observations)
D (numpy.ndarray/list/NoneType) – discrepancy basis, shape (pv, ell), or list of D matrices (for ragged observations)
orig_y_sd (numpy.ndarray/float/NoneType) – standard deviation of original simulation y values (may be scalar or array, length ell)
orig_y_mean (numpy.ndarray/float/NoneType) – mean of original simulation y values (may be scalar or array, length ell)
y_std (numpy.ndarray/NoneType) – standardized y values, shape (n, ell)
x_trans (numpy.ndarray/NoneType) – x values transformed to unit hypercube, shape (n, p)
t_trans (numpy.ndarray/NoneType) – t values transformed to unit hypercube, shape (n, q)
orig_t_min (numpy.ndarray/NoneType) – minimum values (columnwise) of original t values
orig_t_max (numpy.ndarray/NoneType) – maximum values (columnwise) of original t values
orig_x_min (numpy.ndarray/NoneType) – minimum values (columnwise) of original x values
orig_x_max (numpy.ndarray/NoneType) – maximum values (columnwise) of original x values
xt_sep_design (list/NoneType) – list of separable design component matrices
Initialize DataContainer object.
- Parameters
x (numpy.ndarray) – GP inputs (controllable/experimental conditions, would be known for both sim and obs), shape (n, p)
y (numpy.ndarray/list) – GP outputs, shape (n, ell), or list of 1D arrays for ragged observations
t (numpy.ndarray/NoneType) – optional GP inputs (not controllable, would be known only for sim), shape (n, q)
y_ind (numpy.ndarray/list/NoneType) – optional y indices (needed if ell > 1) or list of 1D arrays for ragged observations
sep_des (list/NoneType) – separable Kronecker design
Note
DataContainer objects are constructed when you instantiate SepiaData and generally won’t be instantiated directly.