Data Anonymization in Virtual Reality

Translation: Nils Klowait

Before Nils and I could step foot into the VR research space, we faced the problem of data anonymization.

Read more

Sociologists who work with video data have developed many techniques for this: blurring or pixelating faces, modifying voices, and transforming video into graphics. However, in virtual reality, the issue of data anonymization starts to get weird. A fair number of VR applications already anonymize users to a certain extent: firstly, users do not have faces or bodies – they are replaced by virtual avatars; secondly, the person’s real name is typically not displayed – it either remains unknown or is replaced by a virtual alias. Finally, the user may not even have a voice (as was described in Nils’ previous post). The existing identifying features in virtual reality are the voice (if present) and user actions.

Figure 1. Using graphics to present video data. Source: (Goodwin, 2017, p.229).

The presence of a basic level of anonymization in VR raises questions about the need for further anonymization and informed consent regarding the “videotaping” (i.e. the recording of the researcher’s screen, typically from a first-person perspective) of the interaction. In the field of video analysis, the problem of the impossibility of anonymization is solved by obtaining consent from the participants. For example, when Christian Heath researched auction-house interactions (Heath, 2012), he was not in a position to anonymize data: paintings worth more than two million dollars could only possibly be sold by a very small number of companies.
The impossibility of anonymization can also be purely grounded in research methodology: if you study the direction of gaze, blurring faces is undesirable for reasons relating to transcription and analysis. The most straightforward solution to this problem is obtaining consent from the participants. In cases where consent is not given or is impossible, technical workarounds may be used; for example, the interaction may be sketched by an artist.

In VR, obtaining consent is more problematic. Take cooperative games. 30 players can participate in them at once – would it be necessary to obtain informed consent from all people involved? This is very difficult for both technical and methodological reasons. Firstly, such games do not necessarily feature a list of players (and rarely feature contact details beyond a nickname) which can then be used to obtain per-player consent. Secondly, the greater the number of players, the more likely it is to receive a refusal to use data from at least one participant. If one person in thirty disagrees, then how is the issue to be resolved? Will there be a simple majority rule? Thirdly, since the VR segment is rather small, players will likely remember the virtual avatar of the researcher which may affect the course of the game during the next round of data collection.

Figure 2. RecRoom users.

Similar problems exist in VR chat rooms, where they are further exacerbated by the fact that users constantly drop in and drop out of the space without prior notice (this issue exists in games also, but is normatively sanctioned in most cases). If you take the simpler example of dyadic interaction, as was the case in Nils’ example of Arizona Sunshine interaction, the problems do not disappear. To ask for written permission prior to the start of the game is inadvisable as it can potentially influence its course. Obtaining permission after the game is equally difficult, as the player may leave the game abruptly at any time – without leaving behind contact details.

In other words, obtaining consent for the use of data from virtual reality raises a huge number of questions. To simplify your life, you can anonymize the data to such an extent that the problem of consent arguably does not arise. However, this is also not so simple. For example, is it necessary to modify the user’s voice? If we respond positively to this question, then this must be done in all cases. After all, this amounts to the statement that voice = identity. If we think ahead beyond the (already complex) case of voice identity, we may ask ourselves: are actions, too, an identifying attribute? Is a particular set of actions (such as the solution to an ingame puzzle) part of what makes a user recognizable? If yes, then data from virtual reality cannot be used in principle. We return to the question I raised at the beginning of this post: where are the limits of anonymization of data for an anonymous, contingent space?

Figure 3. An Arizona Sunshine player takes of their hat.

I think research of computer games may be instructive in addressing these issues, as they share a basic level of anonymization with VR.

Sources:
Heath, C. (2012). The Dynamics of Auction: Learning in Doing: Social, Cognitive and Computational Perspectives. Cambridge: Cambridge University Press. doi: 10.1017 / CBO9781139024020

show less