twinify.base package
twinify.base package#
- class twinify.base.InferenceModel[source]#
A statistical model to generate privacy-preserving synthetic twins data sets from sensitive data.
- abstract fit(data: pandas.core.frame.DataFrame, rng: chacha.defs.ChaChaState, epsilon: float, delta: float, **kwargs) twinify.base.InferenceResult [source]#
Compute the parameter posterior (approximation) for a given data set, hyperparameters and privacy bounds.
- Parameters
data – A pandas.DataFrame containing (sensitive) data.
rng – A seeded state for the d3p.random secure random number generator.
epsilon – Privacy bound ε.
delta – Privacy bound δ.
kwargs – Optional (model specific) hyperparameters.
- class twinify.base.InferenceResult[source]#
A posterior parameter (approximation) resulting from privacy-preserving inference on a data set for a particular InferenceModel.
- abstract generate(rng: chacha.defs.ChaChaState, num_parameter_samples: int, num_data_per_parameter_sample: int = 1, single_dataframe: bool = True) Union[Iterable[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame] [source]#
Samples a number of samples from the parameter posterior (approximation) and generates the given number of data points per parameter samples.
By default returns a single data frame samples from the posterior predictive distribution, i.e., for each data records first a parameter value is drawn from the parameter posterior distribution, then the data record is sampled from the model conditioned on that parameter value. num_parameter_samples in this case determines the number of data records included in the returned data frame.
This behavior can be customized to sample more than one data record per parameter sample by setting argument num_data_per_parameter_sample to a value larger than 1, in which case the total number of records returned is num_parameter_samples * num_data_per_parameter_sample.
Setting single_dataframe = False causes the method to return an iterable collection of data frames, each of which contains all data records sampled for a single parameter samples, i.e., in this case this method returns num_parameter_samples data frames each of containing num_data_per_parameter_sample records.
Each of the data frames “looks” like the original data this InferenceResult was obtained from, i.e., it has identical column names and categorical labels (if any).
- Parameters
rng (-) – A seeded state for the d3p.random secure random number generator.
num_parameter_samples (-) – How often to sample from the parameter posterior approximation.
num_data_per_parameter_sample (-) – How many data points to generate for each parameter sample.
single_dataframe (-) – Whether to combine data samples into a single data frame or return separate data frames.
- classmethod is_file_stored_result(file_path_or_io: Union[str, BinaryIO], **kwargs) bool [source]#
Checks whether a file stores data representing the specific inference result type represented by this class.
The file can be specified either as a path or an opened file, i.e., both of the following are possible:
`InferenceResult.is_file_stored_result('my_inference_result.out')`
or ``` with open(‘my_inference_result.out’, ‘wb’) as f:InferenceResult.is_file_stored_result(f)
If a BinaryIO instance is passed, the cursor position will remain the same after this method returns.
- Parameters
file_path_or_io (-) – The file path (as a string) or a BinaryIO instance representing the file to check.
kwargs (-) – Optional (model specific) arguments.
- Note for subclass implementation:
Subclasses need only implement an override for _is_file_stored_result_from_io.
- classmethod load(file_path_or_io: Union[str, BinaryIO], **kwargs) twinify.base.InferenceResult [source]#
Loads an inference result from a file.
The file can be specified either as a path or an opened file, i.e., both of the following are possible:
`InferenceResult.load('my_inference_result.out')`
or ``` with open(‘my_inference_result.out’, ‘rb’) as f:InferenceResult.load(f)
- Parameters
file_path_or_io (-) – The file path (as a string) or a BinaryIO instance representing the file from which to load.
kwargs (-) – Optional (model specific) arguments for loading.
If a BinaryIO instance is passed, the cursor position is advanced to after the data representing the inference result.
- Exceptions:
raises an InvalidFileFormatException if the data in the file is not a valid representation of the inference result type represented by this class.
- Note for subclass implementation:
Subclasses need only implement an override for _load_from_io.
- store(file_path_or_io: Union[str, BinaryIO]) None [source]#
Writes the inference result to a file.
The file can be specified either as a path or an opened file, i.e., both of the following are possible:
`inference_result.store('my_inference_result.out')`
or ``` with open(‘my_inference_result.out’, ‘wb’) as f:inference_result.store(f)
- Parameters
file_path_or_io (-) – The file path (as a string) or a BinaryIO instance representing the file to write data into.
If a BinaryIO instance is passed, the cursor position is advanced to after the data representing the inference result.
- Note for subclass implementation:
Subclasses only need to implement an override for _store_to_io.