Journal of Illustration Studies – Home page

December 2013 || Reports

Debate: Intersemiotic Complementarity
Can modern multimodality theories help us understand the semantic relationship between word and image in texts from the seventeenth century?

by Jahn Holljen Thon

In New Directions in the Analysis of Multimodal Discourse (2007), essays by Christian M. I. M. Matthiessen and Terry D. Royce attempt to create models for analysing multimodal texts where the printed page is the primary focus.1  Matthiessen’s ‘systemic functional exploration’ corresponds well with my own research in acknowledging that the written text and image have been closely intertwined since the early stages of writing. Letters can be made into words, and words can become a long text; in the same way, you can draw lines and connect them up to create a drawing or a picture, and this picture can stand alone or in relation to others. This was possible even in the first handwritten book, before Gutenberg. However, the pairing of letters and lines entered a completely new, and strict, regime after the invention of the printing press.

Although the boundaries between different modalities can be flexible, the semiotic definition is clear: writing can be read aloud, drawing must be ‘translated’ into language. In order to discuss to the relationship between written text and image Royce coins the term ‘intersemiotic complementarity’, which regulates the bond between the visual and verbal modes. Here I shall consider whether these approaches can help me to understand a rebus book from the 1660s, in which words and images are equal partners in the production of meaning. Do the theories of Matthiessen and Royce depend on one mode being dominant? How are we to explain the interaction between word and image in cases where is it impossible to determine which is the primary mode?

In books from the seventeenth century images can have at least four main purposes: mimetic (illustrations tentatively copy and mimic a culturally experienced reality), mathematical (illustrations approach what would otherwise have been presented by numbers and tables), metaphorical (images are made up of symbols and metaphors with a multitude of meanings), or mythological (images refer to a comprehensive mythological universe). Furthermore, many emblematic and descriptive (i.e. illustrative) images had a certain master-text as a prerequisite, predominantly the Bible.

It seems to me that, regardless of how much Matthiessen and Royce emphasize the fluidity of the boundaries between word and image, they both require that textual and visual elements within a medium (the book, for example) represent two sets of separate semiotic resources. Opposing this view is W. J. T. Mitchell, who argues for the relation between text and image, noting that there are examples where images can be read verbally, such as pictograms and emblems. Does this mean that the multimodal discourse analysis underestimates the historical aspect? As Mitchell observes, ‘Any interesting theoretical reflection on visual culture will have to work out an account of its historicality, and that will necessarily involve some form of abstraction and generality about spectators and visual regimes’.2 

I, personally, go further. ‘The possible third’ – something that is neither verbal nor visual – is an important phenomenon which I feel is missing from the theories of both Matthiessen and Royce; nor do I find it expressed in Mitchell’s work, although it is a reasonable extension of his research. In other words, I am attempting to go a step beyond Mitchell, for he and other visual theorists have very little to say about what occurs when writing and the visual are so intertwined as to be inseparable.

A possible option for looking at this theme more closely is to study pages from books in which word and image merge into one unit, where the idea of ‘illustration’ is reciprocal or revoked. We find such creations in contexts both historical and contemporary, but my example here is Nils Thomassøn’s rebus book Cestus sapphicus (an advice poem on marriage and household management). It was published in 1661 and is most probably the first illustrated book to be printed in Norway
Figure 1
. The text in Figure 1 reads:

C[olla], ceu, Ca[ligula] subditorum,
Semper as, d[ilex]it: Alastor ater
Apta tædæ tempora & ut b[aratrum]
flammi [coma]ntis.

[Enemies of the marriage: ‘Oh, Caligula always loved the necks of his subjects | as the black devil too | loves hell and marriage/wedding suitable for | flaming torches.’3 

Thomassøn has created pages of books and artistic expressions which make it impossible to decide which is more important, the text or the image, even if the majority would probably say that the words are slightly more prominent. However, this is modified by the fact that the letters of the baroque artist also are created artistically, in the way they wind themselves around visual elements like illustrated acanthus leaves. The result is that text and visual elements unite into a collected type of graphical expression, where all parts create meaning in a sort of dual sequence (sequence of double meaning): ‘the possible third’ is a tangible reality.

The relationship between the different semiotic systems can obviously be either harmonious or full of contradictions and tensions. There are many ways to systematize this relationship. Terry D. Royce’s expression ‘intersemiotic complementarity’ suggests that the text and the image can relate to each other in six ways: through repetition (identical meaning), synonymy (the same or similar meaning), antonymy (the opposite meaning), metonymy (the relationship between a part and the whole), hyponymy (the relationship between the general and a single specimen), and collocation (that which is expected). In principal, this list of connections can be continued infinitely.

To a great extent, Royce and Matthiessen’s thought dates back to Roland Barthes’s classic text from 1964, ‘Rhetoric of the Image’ (‘Rhetoric de l’image’).4  Barthes concentrates on the image more than the text, and analyses the polysemic aspect of all images. For him, the linguistic message has two main functions regarding the iconic message. The first he calls ‘anchoring’ (l’ancrage), and the second ‘exchange’ (relais). ‘Anchoring’ signifies when the linguistic message constitutes a mechanism which counters the fluidity of the visual significant, to prevent dissolution. In contrast, the term ‘exchange’ relates to that which supports or strengthens a reading (or interpretation), where image and text complement one another and alternate in bearing the message.

Matthiessen furnishes us with introductory remedies. It does make sense to distinguish between the written text, the graphical expression, and the visual system, and his thinking may help in analysing the difference between pictures used in emblems and used as pictura (as in a painting, or in the art of painting). However, Matthiessen draws this distinction too firmly.

My interpretation presents three possible, entirely different, readings of Thomassøn’s rebus book. The first is based on logos, where I read the definitive text as a purely linguistic poem, while the second focuses on the interpretation of the rebus, where the rebuses have to be solved. Finally, the third reading is multimodal, iconologically unitary and unified; here I view the page of a book as a single image. In this last means of interpretation, I assume that all figures are the bearers of symbolic and cultural meaning: a monkey, for example, is not simply an animal with a neutral significance, it represents fertility, greed, vanity, etc.

Is there room for such lines of reasoning in Royce? In Matthiessen? This would entail the ‘logos image’ creating associations and evoking cultural meanings which can suggest life experiences, emotions and fragments of a story. Furthermore, each image must be seen in context with the whole book, including the outline form, explanations and comments, of which there are many in Thomassøn’s text.

Certainly, in considering the rebus book I have made much use of Royce’s six forms of relationship between word and image. To give a small indication of my interpretation, I shall provide an example.

Nowhere in the Cestus sapphicus is the contrast between work and image greater than in the eighth stanza (Figure 1, above). The objects are taken from the kitchen and the field (clay pot, spoon, plough); the oak tree is reaching for eternity and immortality; the plough is reaching down, opening the earth, making it fertile. It seems as if the picture is consciously constructed as a wholesome and holistic unit, with a clear correspondence between the making of food, the plough, and vegetation. The emblematic aspect of these signs is evident here only because it accentuates the already obvious connotations of nutrition and plants. Of course, the oak tree is what gives the iconotext this component of meaning. In Henkel and Schöne, the most well-known emblem lexicon, the oak is represented by numerous examples.5  The mature tree represents strength and eternity, while the acorn, because of its ability to grow, is important as a symbol of life’s eternal regenerative ability. Emblematically, the unique meaning of crops, of joy of life, and of creation is unambiguous. Meanwhile, the text describes a story of murder, the devil, hell and flaming torches – destruction and annihilation. With all its brutality, it is hard to interpret this as anything other than an ironic contrast with the ideal of marital fertility conveyed by the images. Regardless of whether the author insists on the neutrality of the figures, it must provoke and surprise the reader, in the seventeenth century as much as today, that such an emphatic depiction of forces bent on destroying the institution of marriage is conveyed by signs that so clearly signify eternity, fertility, and growth. Although there can be small differences in interpretation, it is evident to the reader that the gap between image and text could scarcely be any greater. Thus, Royce’s analysis functions well on a seventeenth-century text.

Nevertheless, I find that there are some elements that this model of analysis is unable to capture. Here I shall outline five of them. First, I think Royce focuses too much on ‘coherence’: he perceives the relation between text and image as a question of harmony. Secondly, in my opinion there ought to be more room in the analysis for an understanding of intervisuality, such as the emblematic tradition with its cultural reservoir of meanings and perceptions, which some of my colleges call ‘visual intertextuality’.

A third objection is that the cultural context (the Netherlands in the 1620s) is too narrowly defined when conceived only as a ‘set of semiotic systems’ or ‘a system of meaning which is related’. Much of the discussion about the significance of context in the history of art and culture revolves around concepts that are not necessarily related, such as the art of painting, or the emblematic. One must also have some knowledge about the change of the medium, and the historical interaction between speech, handwriting, and the printing press; handwritten texts excluded many receptions, while printed texts included them.

Fourth is the issue of textual hierarchies: the boundaries between side texts and main texts is a problem within literary science to which social semiotics offers few answers. And finally, there is a linguistic problem. Royce specifies that a picture in itself can have a number of functions, while Matthiessen talks about different visual systems. However, the scholar must understand that the visual concept/term ‘image’ was very complex in the seventeenth century, as a few key words will illustrate: there was then a big difference between emblem (plural emblemata, often used about mosaics), icon (from the Greek eikon, meaning image or figure), imago (a depiction/likeness), picture (a painting, picta = a portrait), and simulacrum (that which resembles something, an image, an imitation, a fantasy image, a shadow image).


Matthiessen and Royce provide some useful tools, but these are inadequate when it comes to some historical texts. I can conclude that in the basic ‘rebus emblematicus’ of Thomassøn’s Cestus sapphicus, word and image become representations of each other. In this book the most distinctive feature of the image is the word (it is a depiction as part of a rebus), while the word’s innermost characteristic is the image, or figure (the outer world’s imprinting, such as the way marriage is portrayed). Seen in this way, there is no fundamental boundary between image and word in Cestus sapphicus, where illustration becomes a phenomenon in which two graphical signs of different modes illuminate each other.


1 Christian M. I. M. Matthiessen, ‘The Multimodal Page: A Systemic Functional Exploration’, and, Terry D. Royce, ‘Intersemiotic Complementarity: A Framework for Multimodal Discourse Analysis’, both in New Directions in the Analysis of Multimodal Discourse, ed. by Terry D. Royce and Wendy L. Bowcher (Mahwah, NJ: Lawrence Erlbaum), pp. 1–62, and 63–111, respectively.

2 W. J. T. Mitchell, Iconology: Image, Text, Ideology (Chicago: University of Chicago Press, 1986).

3 Note: olla = pot; ligula = spoon. The translation is from Vibeke Roggen, Intellectual Play – Word and Picture: A Study of Nils Thomassøn´s Latin Rebus Book ‘Cestus sapphicus’, with Edition, Translation and Corpus of Sources, 2 vols (Oslo: University of Oslo Press, 2002), II.

4 Roland Barthes, ‘Rhetoric de l’image’, Communication, 4 (1964), 40–51. For an English translation, see Roland Barthes, ‘Rhetoric of the Image’, in Image, Music, Text, trans. by Stephen Heath (London: Fontana Press 1977), pp. 32–51.

5 Arthur Henkel and Albrecht Schöne, Emblemata: Handbuch zur Sinnbildkunst des XVI. und XVII. Jahrhunderts (Stuttgart: Verlag J. B. Metzler, 1967; rept. 1996).



Citing this article:
Thon, Jahn Holljen. “Debate: Intersemiotic Complementarity
Can modern multimodality theories help us understand the semantic relationship between word and image in texts from the seventeenth century?.” Journal of Illustration Studies (December 2013). 24 Apr 2017. <>