Over the course of the project, a number of dedicated topics worthy of separate research have emerged. Each of these topics, given enough data-analysis, literature reviews and, most importantly, conceptual work, would be candidates for publications. I would like to use this space to talk about some of these ideas in their outlines, and as they are developing. I’ll try to add relevant literature for each topic. Due to the relative length of the first topic, it was decided to split these reports into separate posts.

This is the oldest and arguably most developed line of inquiry (for the introductory blog post about this, click here). Virtual reality allows for many different constellations of differently-abled interactants to meet in the same space. With Arizona Sunshine, you oftentimes meet ‘mute’ players without microphones who only have gestural input and body tracking available to them as communicative tools.

In the original blog post, I talked about how difficult it was to communicate “Hey, you are wearing a mask. In order to put on a new mask, you need to remove the mask you are already wearing.” without any words. There seemed to exist limitations peculiar to gestural modalities, since the sequence ‘put on mask’ (grab mask, hand towards face, grab button release, hand away from face) was difficult to disambiguate from ‘remove mask’ (hand towards face, grab button press, hand away from face, grab button release):

  1. ‘grab button press’ and ‘grab button release’ are only different in their consequence, but do not register as visible events for other players.
  2. The initiation of ‘remove mask’ from the ‘home position’ of the player’s hand adds further ambiguity, as there is no apparent method of separating ‘sequence preparation’ from ‘sequence start’. Thus, the hand moving towards the face may either be interpreted as the first step of ‘how to put on a mask’, or the preliminary step towards the initiation of ‘how to remove a mask’.

More fundamentally, however, there seems to be a difficulty of using gesture sequences to point at proceeding gesture sequences. I.e. it seems to be nontrivial to do ‘the thing you are doing right now is incorrect, do this instead’ with gestures only. This property originally lead me to the avenue of exploring the specific indexical qualities of speech as opposed to non-speech interaction. With words, it is relatively easy to point at ongoing action. Gestural sequences, on the other hand, seem to be employed in a sequential – rather than simultaneous and mutually-pointing – manner, which has a notable transformative effect on how interaction unfolds.

However, over time, I’ve come to reconsider this issue in more general terms. The problem of ‘pointing at ongoing action’ (“stop doing this, do this instead”) is not a peculiar characteristic of speech, but the specific multimodal ecology of an encounter. Goodwin’s analysis of Chil’s rich interactional toolset (in the near-total absence of complex self-produced language) is an illustration of non-verbal indication: we can use prosody, gestures, gaze, body-position, etc. to actively disengage, engage or point at ongoing sequences of action. Chil could, by literally pointing at speech produced by other people, say ‘this’ and ‘not this’. Using rich prosody, he could take an active part in the co-operative production of ‘I don’t want toast, but I want something similar to it’.

In other words, the property of pointing at ongoing sequences of action, and the complex mutual accomplishment of action enabled thereby, is not a characteristic peculiar to spoken human language. Put differently, the difficulty of pointing-at-sequential-unfolding-action seems to be rooted in something that is not immediately related to the specific modality. This brings me to my current preliminary hypothesis: the issue is not so much the restriction imparted by a specific semiotic register. Rather, the issue is the presence of multiple registers, or the absence of it. In a world of only sequential gestures, it becomes much more difficult to say ‘stop doing that’, as the doing-‘stop-doing-that’-ness has to come after the completion of the other interactant’s sequence, which makes it more difficult to treat the ongoing sequence as a public substrate available for modification and reuse.

Even more generally, this leads us to the question of monomodal/bimodal types of interactions, i.e. spaces where our pointings – and thereby the capacity for substration – are limited: is the definition of modality, semiotic register, public substrate, interactional ecology, etc. rigid enough to afford an analysis of spaces with reduced modalities? More than that: is it even possible to convincingly postulate that monomodal spaces can exist, for all practical purposes? After all, Arizona Sunshine, even in the absence of voice input, doesn’t force users to use a single semiotic register for interaction. Bodies can move, we can point at objects in the world, we can develop complex ad-hoc symbolic conventions, we can probably do a range of unexpected things. Similarly, if the issue is ‘sparse modalities as limiting the synchronous manipulation of the public substrate’, then doesn’t this also mean that telephone conversations have similar issues? Would this consideration not invite a more granular approach that analyzes things like telephone-conversation overlap as a case of limited-modality-synchronicity?

These are currently questions that I’m considering regarding this topic. It will take a serious review of existing literature, a detailed analysis of our data and more collective discussions to arrive at some semi-convincing conclusion. In the meantime, it seems prudent to read more contemporary literature on semiotics, particularly concerning indexicality, and to consult classical conversation analytic studies of spaces with seemingly reduced modal repertoires. Goodwin’s analysis of Peircean semiosis, along with modern semiotic approaches (including Goodwin’s intertwined semiosis) are instructive.


Goodwin, C. (2000). Action and embodiment within situated human interaction. Journal of Pragmatics, 32(10), 1489–1522. doi:10.1016/s0378-2166(99)00096-x 

Mondada, L. (2016). Challenges of multimodality: Language and the body in social interaction. Journal of Sociolinguistics, 20(3), 336–366. doi:10.1111/josl.1_12177

Mondada, L. (2018). Multiple Temporalities of Language and Body in Interaction: Challenges for Transcribing Multimodality. Research on Language and Social Interaction, 51(1), 85–106. doi:10.1080/08351813.2018.1413878