Effective protocol for securing a user’s private information when algorithms use it to recommend content

Researchers are designing an effective protocol to secure a user’s private information when algorithms use it to recommend products, songs, or shows. Credit: Christine Daniloff, MIT

Algorithms recommend products when we shop online or suggest songs we might like when we listen to music on streaming apps.

These algorithms work by using personal information such as our past purchases and browsing history to generate personalized recommendations. The sensitive nature of this data makes preserving privacy extremely important, but existing methods to solve this problem rely on heavy cryptographic tools requiring huge amounts of computation and bandwidth.

MIT researchers may have a better solution. They have developed a privacy protection protocol that is so effective that it can run on a smartphone on a very slow network. Their technique protects personal data while ensuring the accuracy of the results of the recommendations.

In addition to user privacy, their protocol minimizes the unauthorized transfer of information from the database, known as leaking, even if a malicious agent attempts to trick a database into revealing secret information.

The new protocol could be particularly useful in situations where data leaks could violate user privacy laws, such as when a health care provider uses a patient’s medical history to search a database of medical records. other patients with similar symptoms or when a company serves targeted advertisements to users under EU Privacy Regulations.

“It’s a really difficult problem. We relied on a whole series of cryptographic and algorithmic tricks to arrive at our protocol,” says Sacha Servan-Schreiber, a graduate student at the Computer Science and Artificial Intelligence Laboratory (CSAIL ) and main author of the article presenting this new protocol.

Servan-Schreiber authored the paper with fellow CSAIL graduate student Simon Langowski and their advisor and lead author Srinivas Devadas, Edwin Sibley Webster Professor of Electrical Engineering. The research will be presented at the IEEE Symposium on Security and Privacy.

The data alongside

The technique at the heart of algorithmic recommendation engines is known as nearest neighbor search, which involves finding the data point in a database closest to a query point. Data points mapped nearby share similar attributes and are called neighbors.

These searches involve a server that is linked to an online database that contains concise representations of data point attributes. In the case of a music streaming service, these attributes, known as feature vectors, could be the genre or popularity of different songs.

To find a song recommendation, the client (user) sends a query to the server that contains a certain feature vector, such as a genre of music the user likes or a compressed history of their listening habits. The server then provides the ID of a feature vector in the database that is closest to the client’s query, without revealing the actual vector. In the case of music streaming, this identifier would likely be a song title. The client learns the title of the recommended song without learning the feature vector associated with it.

“The server needs to be able to do this calculation without seeing the numbers it’s doing the calculation on. It can’t actually see the features, but should always give you the closest thing in the database,” says Langowski. .

To achieve this, the researchers created a protocol that relies on two separate servers that access the same database. Using two servers makes the process more efficient and allows the use of a cryptographic technique called private information recovery. This technique allows a client to query a database without revealing what they are looking for, says Servan-Schreiber.

Overcome Security Challenges

But while retrieving private information is secure on the client side, it alone does not provide database privacy. The database proposes a set of candidate vectors – the nearest possible neighbors – for the client, which are usually winnowed later by the client using brute force. However, it can say a lot about the database for the client. The additional privacy challenge is to prevent the client from learning these additional vectors.

The researchers used a tuning technique that eliminates most of the extra vectors in the first place, then used a different trick, which they call unconscious masking, to hide all of the extra data points except the closest neighbor. close. This effectively preserves the privacy of the database, so the client won’t learn anything about the feature vectors in the database.

Once they designed this protocol, they tested it with a non-private implementation on four real-world data sets to determine how to tune the algorithm to maximize accuracy. Then they used their protocol to perform private nearest neighbor search queries on those datasets.

Their technique requires seconds of server processing time per request and less than 10 megabytes of communication between client and servers, even with databases containing more than 10 million items. In contrast, other secure methods may require gigabytes of communication or hours of computing time. With each query, their method achieved greater than 95% accuracy (meaning that almost every time it found the actual closest approximate neighbor to the query point).

The techniques they used to enable database privacy will thwart a malicious client even if it sends bogus requests to try to trick the server into divulging information.

“A malicious client won’t learn much more information than an honest client following the protocol. And it also protects against malicious servers. If one deviates from the protocol, you might not get the good result, but they’ll never learn what the client’s request was,” says Langowski.

In the future, the researchers plan to adjust the protocol so that it can maintain privacy using a single server. This could allow it to be applied in more real-world situations, as it would not require the use of two non-conniving entities (which do not share information with each other) to manage the database.

“Nearest neighbor search underpins many critical machine learning-based applications, whether providing users with content recommendations or ranking medical conditions. However, this usually requires sharing a lot of data with a central system to aggregate and enable research,” says Bayan Bruss, head of machine learning applied research at Capital One, who was not involved in this work. “This search is a key step in ensuring that the user receives the benefits of nearest neighbor search while being confident that the central system will not use their data for other purposes.”


Data management system developed to bridge the gap between databases and data science


More information:
Private fuzzy nearest neighbor search with subline communication. eprint.iacr.org/2021/1157.pdf

Provided by Massachusetts Institute of Technology


This story is republished with the kind permission of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about research, innovation, and education at MIT.

Quote: Efficient protocol for securing a user’s private information when algorithms use it to recommend content (May 13, 2022) Retrieved May 15, 2022 from https://techxplore.com/news/2022-05-efficient-protocol -user-private-algorithms.html

This document is subject to copyright. Except for fair use for purposes of private study or research, no part may be reproduced without written permission. The content is provided for information only.

Sharon D. Cole