_static/cdpg-logo_hires.png

(Center of) Data for Public Good Anonkit

CDPG Anonkit is a toolkit that can be used to preprocess, anonymise, and post-process data. This toolkit was originally written as an application intended to be run inside a Trusted Execution Environment (TEE) and was later developed into a python package to allow anyone to be able to use it for any dataset.

pip install cdpg-anonkit --extra-index-url=https://test.pypi.org/simple/
 import cdpg_anonkit
 import pandas as pd

 # Quick example
 from cdpg_anonkit import SanitiseData as sanitisation

 example_data = pd.DataFrame({
      'age': [25, 40, 15, 60, 18, 90, 22, 45, 50, 55],
      'income': [50000, 80000, 65000, 120000, 20000, 90000, 55000, 75000, 85000, 95000],
      'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Henry', 'Ivy', 'Jack'],
      'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix',
               'New York', 'Chicago', 'Los Angeles', 'Dallas', 'Dallas']})

 sanitisation_rules = {
  'age' : {'method': 'clip', 'params': {'min_value': 25, 'max_value': 70}},
  'name' : {'method': 'hash', 'params': {'salt': 'md5'}},
}

 sanitised_data = sanitisation.sanitise_data(df=data_test,
                                            columns_to_sanitise=['age', 'name'],
                                            sanitisation_rules=sanitisation_rules)

Contents

Index