twinify.napsu_mq package
twinify.napsu_mq package#
- class twinify.napsu_mq.napsu_mq.NapsuMQModel(required_marginals: Iterable[FrozenSet[str]] = (), use_laplace_approximation: bool = True)[source]#
Implementation for NAPSU-MQ algorithm, differentially private synthetic data generation method for discrete sensitive data.
reference: arXiv:2205.14485 “Noise-Aware Statistical Inference with Differentially Private Synthetic Data”, Ossi Räisä, Joonas Jälkö, Samuel Kaski & Antti Honkela
- fit(data: pandas.core.frame.DataFrame, rng: chacha.defs.ChaChaState, epsilon: float, delta: float, query_sets: Optional[Iterable] = None, **kwargs) twinify.napsu_mq.napsu_mq.NapsuMQResult [source]#
Fit differentially private NAPSU-MQ model from data.
- Parameters
data (pd.DataFrame) – Pandas Dataframe containing discrete categorical data
rng (d3p.random.PRNGState) – d3p PRNG key
epsilon (float) – Epsilon for differential privacy mechanism
delta (float) – Delta for differential privacy mechanism
- Returns
Class containing learned probabilistic model with posterior values
- Return type
- class twinify.napsu_mq.napsu_mq.NapsuMQResult(dataframe_domain: Dict[str, List[int]], queries: twinify.napsu_mq.marginal_query.FullMarginalQuerySet, posterior_values: jax.Array, data_description: twinify.dataframe_data.DataDescription)[source]#
NAPSU-MQ result class containing learned differentially private probabilistic model from data. Contains functions to generate differentially private synthetic datasets from the original dataset.
- generate(rng: chacha.defs.ChaChaState, num_parameter_samples: int, num_data_per_parameter_sample: int = 1, single_dataframe: bool = True) Union[Iterable[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame] [source]#
Samples a number of samples from the parameter posterior (approximation) and generates the given number of data points per parameter samples.
By default returns a single data frame samples from the posterior predictive distribution, i.e., for each data records first a parameter value is drawn from the parameter posterior distribution, then the data record is sampled from the model conditioned on that parameter value. num_parameter_samples in this case determines the number of data records included in the returned data frame.
This behavior can be customized to sample more than one data record per parameter sample by setting argument num_data_per_parameter_sample to a value larger than 1, in which case the total number of records returned is num_parameter_samples * num_data_per_parameter_sample.
Setting single_dataframe = False causes the method to return an iterable collection of data frames, each of which contains all data records sampled for a single parameter samples, i.e., in this case this method returns num_parameter_samples data frames each of containing num_data_per_parameter_sample records.
Each of the data frames “looks” like the original data this InferenceResult was obtained from, i.e., it has identical column names and categorical labels (if any).
- Parameters
rng (-) – A seeded state for the d3p.random secure random number generator.
num_parameter_samples (-) – How often to sample from the parameter posterior approximation.
num_data_per_parameter_sample (-) – How many data points to generate for each parameter sample.
single_dataframe (-) – Whether to combine data samples into a single data frame or return separate data frames.