How Buckhill Utilizes in-house developed Machine Learning Data Scrubber for Comprehensive Database Anonymization and Compliance



In an intricate landscape of interconnected systems and databases, the importance of data protection and anonymization of Personally Identifiable Information (PII) has never been more critical.

At Buckhill, we take a proactive approach to data protection, that not only prioritizes privacy but upholds the integrity of information. We successfully manage and achieve this approach with our robust Machine Learning powered Data Scrubber toolset. This Python 3 based project ensures the anonymization of personally identifiable information (PII) within databases we manage, promoting a secure environment for handling sensitive data.

With the Data Scrubber tooling, we can scrub any database at scale; ensuring our commitment to extensive protection of our clients’ information, adhering to the EU’s General Data Protection Regulation (GDPR), as well as our commitment to compliance.


Navigating the EU’s GDPR: Pseudonymised data

Pseudonymised data, as defined under EU law, involves processing personal data in a manner that prevents attribution to a specific data subject without the use of additional information. This additional information must be kept separately and be subject to technical and organizational measures ensuring that the personal data cannot be linked to an identified or identifiable natural person.

Despite being considered personal data, pseudonymised data enjoys legal safe harbor when shared with third-party data processors, provided these entities do not have access to the additional information that could lead to the identification of individuals. Moreover, the retention of pseudonymised data is restricted to lawful necessity.


The Buckhill Strategy

In our data management strategy, we integrate state-of-the-art natural language processing (NLP) models, such as Spacy and HuggingFace, to identify personally identifiable information (PII) accurately and efficiently within our database tables. This approach is critical as we navigate the intricacies inherited by our use of Entity-Attribute-Value (EAV) structures as it accommodates the dynamic nature of our customers' and the insurance industry's evolving technical and product landscape.
While conventional data scrubbing relies on knowing the data's location, in our advanced approach the scrubber intelligently parses EAV field names, evaluating whether the values within these fields might constitute PII.

Subsequently, the scrubber operates intelligently to either anonymize or expunge PII-containing values from the database.

Leveraging cutting-edge NLP models for pseudonymization of data reinforces our commitment to robust and secure data management practices in an ever-evolving regulatory landscape, while ensuring full alignment with EU GDPR standards.


Our Commitment

In an era where data privacy is paramount, Buckhill’s use of Data Scrubber and relevant tools stands as a reliable guardian of sensitive information.

By employing advanced scrubbing techniques while providing a user-friendly interface, we deliver an efficient way to cleanse databases of PII, contributing to a secure and compliant data ecosystem that helps our clients in safeguarding their digital footprints.


Related Posts

As a company stuck supporting out-of-date operating systems, what options do you have?

This article assumes that vulnerable operating systems, such as Windows XP, cannot be upgraded in the short term, and therefore, what options are available to help mitigate the risk.


Computer Security Day

The first computer virus attacked Cornell University in 1988. Computer Security Day was launched as an annual event ever since. Things have moved on since 1988, but the threat from cybercrime is more serious than ever.