-
Notifications
You must be signed in to change notification settings - Fork 1
Data Anonymization
- Hash Function Requirements
- Data Anonymization requirements
- Limitations
- Canadian data governance frameworks from information providers
- The hashing function must handle variability in input data, such as common misspellings or variations in name recording, especially relevant in homelessness data collection.
- It should produce unique IDs for each person, avoiding the situation where two different people result in the same computed ID.
- The function must prevent the recovery of the initial value parameters, including protection from statistical attacks, potentially requiring the use of multiple keys.
- The hashed ID should be computed at the source before transmission, meaning organizations cannot simply send data files but need to run a process first.
- Address secure computation methods to ensure that the protection of pseudonymous data is upheld and no direct access to sensitive data is possible during analytical processes.
- Manage the cryptographic keys under both basic and strong pseudonymization, including storage, handling, and potential destruction protocols to ensure irreversibility when required.
- The data sent to DASH should be encrypted and transmitted using a secure protocol.
- Data stored in the database should be encrypted.
- The comparison between two IDs must be efficient enough for practical use.
- The selected hash function is proposed to be NAME+LAST NAME+Month_Year of Birth.
- Use one-way hashing, format-preserving encryption, or privacy-preserving for linking cross-source data while going through an intermediate source anonymization.
- Data minimization should be observed throughout to collect only relevant data that could be transformed.
- Follow privacy laws such as PIPEDA, PHIPA, HIA, or any other local privacy laws that are applicable.
- Carry out frequent risk assessment of re-identification capabilities and adapt methods and policies accordingly.
- Provide a holistic approach to anonymization that informs the statistical estimators of identifiability and data transformations directly applied to data.
- Transmission of secured data, for example, HTTPS, TSL or VPN protocols should be used for keeping the information integrity and confidentiality during data transit.
- Use strong encryption standards such as AES, Triple DES, RSA to encrypt data stored in a database.
- Keep detailed audit trails and carry out continuous monitoring to identify any unauthorized access or discrepancies in data management.
- Ensure stringent access control, implement the principle of least privilege, and establish an adequate incident response for possible data breaches.
- Use an appropriate risk-based anonymization procedure after conducting an analysis on the potential for identifiability estimates and driving the data transformations required for the access or publication of a data set below the identifiability threshold only for secondary purposes.
- Data governance framework should establish the ethical use of data with societal perspective and data subject rights.
- Follow the laws when conducting cross-border or cross-provinces transfers of data among the states.
- Use homomorphic encryption-based secure computation techniques for analyses while maintaining privacy by leaving underlying source data undecrypted.
- Delete fields or convert them into tokens, pseudonyms, or fake data. Unique fields can be eliminated using data transformations based on publicly known traits.
- In the case of cryptographic keys, the storage and usage must comply with basic pseudonymization, while strong pseudonymization requires irreversible destruction of keys in a legal context involving secondary usage.
- Set up the K-anonymity of the data that ensures that each individual is indistinguishable from at least k-1 other individuals in the dataset. This value will represent the smallest group size in which a data subject's information can be ‘hidden’ to preserve anonymity. a higher K-value is recommended to ensure robust privacy.
- Follow all ethical and legal standards.
- Performance BI could be slow on the encrypted database.
- If data is to be transferred across provincial or national borders, the system must ensure such transfers comply with all applicable laws and maintain the same level of protection as within Canada. 3. A lack of real-time data and the analysis of this data.
- There is a likelihood of small errors and typos in the collected information, and the proposed hash function should be able to handle this.
- If DASH operates across all the provinces, it needs to meet all of the provinces and territories' regulations provided above.
- Right now we have fake data, so it is really difficult to train the model.
- We need to fake cross-table constraints - DB forth, and synthetic data does not make sense.
- People outside of the University of Ottawa do not have access.
- Power BI cannot support bilingual reports at one time.
- There is a possibility of missing information for the Hash function, e.g. not everyone can tell the month or year of birth, and the field can be blank.
- PowerBI dashboard does not have the indicators (the targets against which metrics are assessed) for specific stakeholders.
-
Personal Information Protection and Electronic Documents Act (PIPEDA) - is the federal privacy law for private-sector organizations. It sets out the ground rules for how businesses must handle personal information during their commercial activity. – will be changed to the Consumer Privacy Protection Act (CPPA)
-
DND/CAF Data Governance Framework - a structured approach designed to manage and utilize large amounts of data. It focuses on transitioning data access from a 'need to know' to a 'need to share' basis, emphasizing data integration, orchestration, and interoperability.
-
Canadian Cybersecurity Framework: Developed by the Canadian Centre for Cyber Security, this framework provides standards and guidelines for organizations to follow in order to protect their systems and data from cyber threats.
-
The Canadian Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (TCPS 2): This is important for researchers handling personal data. It provides guidelines for the ethical conduct of research involving human participants, including data governance.
-
CANASA - is a national not-for-profit organization dedicated to advancing the security industry. [VK1]
-
SCE – Communication Security Establishment
- Personal Health Information Protection Act:
- Ontario: Personal Health Information Protection Act (PHIPA)
- British Columbia: Personal Information Protection Act (PIPA) for private sectors and the Freedom of Information and Protection of Privacy Act (FIPPA) for public sectors.
- Manitoba: The Personal Health Information Act (PHIA).
- Newfoundland and Labrador: The Personal Health Information Act (PHIA).
- Health Information Act:
- Alberta: Health Information Act (HIA)
- Saskatchewan: The Health Information Protection Act (HIPA).
- Prince Edward Island: The Health Information Act (HIA).
- Northwest Territories: The Health Information Act (HIA).
- Nunavut: The Health Information Act
- New Brunswick: The Personal Health Information Privacy and Access Act (PHIPAA).
- Yukon: The Health Information Privacy and Management Act (HIPMA).