Musical Gestures

Music is made with our hands. Music is made with our feet. Music is made with our bodies, our voices, our rhythms—our movements make music. We see this in the way we make instruments, the way we play them, and the way we move to the sounds they make.

How can we take such an inherently physical art and create digital tools and artifacts for it? How can we understand the designs of our software as being participant in mind-body-artifact relationships? How can we design software tools that amplify our natural human rhythms?

The purpose of this piece is to consider the role that gestures plays in the music-making process, to examine the ways that our tools interact with gestures, and to explore how different digital devices provide sensing abilities to capture and translate physical gestures to digital data.

Defining Gesture

Gesture is the product of motion, a “configuration of curves in space and time,” per Wikipedia. In his introduction to Interpreting Musical Gestures, Topics, and Tropes, Robert Hatten describes how “musical gestures are often made distinctive through specific articulations, dynamics, and pacing or timing—and given unique shape by the systematic potential of rhythm and meter, texture, and timbre.”

Gesture is the lens through which we view the shapes and interactions in movements, and how we understand the dynamics in motion. Per Hatten, “human gesture may be understood as a fundamental and inescapable mode of understanding that links us directly to music’s potential expressive meaning”.

Per Tversky/Jamalian, “gestures transform actions on perceptible objects to actions on imagined thoughts, carrying meaning with them rapidly, precisely, and directly”. Tversky/Jamalian focus on how gesture, as well as language and graphics, makes connection between physical tools, like our bodies and created artifacts, and cognitive tools like thoughts.

Artifacts like hammers and chalk allow us to make use of our minds and bodies in ways that augment and extend our ability to act on the world, and have tangible effect on expanding the outcomes we can achieve in the world. Usage of these artifacts allows us to flow thoughts through our bodies and into the world in amplified and extended ways, making their existences useful and meaningful.

Gestures in the Age of Computers

In designing and creating musical tools, it is paramount that we preserve the ability to make music with our bodily movements and pay attention to the ways we are extending or constraining the set of possible expressible gestures. Musical tools and artifacts—means of amplifying our minds and bodies—are not neutral. They have attitudes and natures that promote certain outcomes.

If we consider the computer, the instrument du jour for making modern music, our means of interaction is limited to our eyes and hands. While we can still play instruments, record sounds, record automations into our computers—each having the potential to engage different arrangements of our bodies—when we execute actions within the computer, our hands must be close together and work along fixed planes to press buttons or move a mouse/trackpad.

Bret Victor leverages a critique towards this default position towards our modern technology (paper, books, computer) in The Humane Representation of Thought, saying that we’ve invented a style of knowledge work that involves “sitting at a desk, staring at a little tiny rectangle and making little motions with our hands.” His critique extends back several hundred years, to the time before computers, but it has become especially relevant in music work in the recent decades, as we moved our work into smaller rectangles and do work through smaller motions of our hands.

While theorists like Hatten are excited by the “synthetic and emergent aspects” of gesture leading to potential for human expression, Victor cautions that in designing software artifacts, we are directly involved with an environment that limits human gesture, to an extent he views as “inhumane.” Victor’s way out is to think outside the box, to view the computer as a set of tangible, interactable objects within a physical space. This allows a broader set of gestures to be executed within a computing environment. So what is this computing environment, in the musical sense?

Musical Gestures

When we consider musical devices like the flute, piano, and harp, we find that the way we arrange our body to facilitate gestures is not so different from digital musical devices like computers and phones. In the upright, seated positions, we find similarities to how we sit when using a computer. In the ways we arrange and move our arms, we find similarities to how we use a mouse, keyboard and touchscreen.

However, though we find that our bodies can occupy similar positions, we find fewer degrees of freedom to express gestures with our hands on computers and phones than musical instruments.

When we play the piano, we transmit some piece of intentional data through a gesture of the hands onto the keys. The keys we strike, the pressure at which we strike them and the length we hold them down communicates our intention for what sounds we wish the instrument to make and what properties they should have. Our gestures model this sound, and the piano reacts mechanically to produce it.

The keys on a computer keyboard are not so sensitive. While different gesturing to a piano key can create different “values,” the gesturing a computer key only transmits one value, that the key has been pressed. In contrast the piano, while we can communicate what note to play, when to play it and for how long, we can’t communicate well at what strength to play it and more subtler notions of how the note should sound.

This inability to express certain properties within music computing environments can be thought of as a loss of gestural possibility. Physical objects have high gestural possibilities, namely all the ways we can think to interact with them. Digital objects have gestural possibilities defined by their APIs or programmed abilities, so it is up to the programmers to design ways that all can interact with these objects. Often, these designs under-appreciate the full range of gestures possible with these objects, and in doing so constrict the interaction possibilities.

Digital objects require an extra level of indirection, in which a model for the acceptable gestures is produced, specifying the format for the data that is to represent the gesture. This allows sensors to record and format signals in accordance with this model, such that the signals can be understood and acted upon.

A tangible example of this can be found in MPC drum machines, which expose a grid of pads to the player, to be struck by the fingers to trigger sounds. These pads are like piano keys, having been equipped with pressure-sensing facilities that allow for the strength of the hit to be captured and acted upon. This welcomes the gestures of striking a pad at different strengths into the music environment, allowing them to have enhanced meaning, transmitting both a note and an articulation.

Sensing Gestures

Keyboards, mousepads and touchscreens also have their own notions of gestures, and can sense for different data according to their abilities. These can be thought of as part of the broader class of sensors.

In order to understand the role of sensors in determining gestural possibility, we can consider the dimensionality of the values they produce. Sensors can be understand as producing streams of values, each value with its own dimensionality.

Sensors like computer keyboards and mice, though standard on all computers, allow for 0D and 2D signals. Each key allows a trigger signal to be sent, a signal of zero dimensions, or a 0D signal, capable of being in only an on or off state. The mouse sends 2D signals, namely the 2D vector of movement that will update the position of the mouse, as well as 0D gesture of clicking.

MIDI-equipped instruments like drum machines and keyboards contain controls that handle 1D gestures, produced by a pad/key can take on a range of velocity values corresponding to the pressure or strength of the hit, a knob that can be turned or a slider that can be moved along a range.

Beyond using keyboards and pads for sensing touch, we can also think of other senses to record. Audio-recording sensors allow the sensing of sonic phenomena, while picture-recording sensors allow the sensing of visual phenomena. These each produce a more complex set of signals, that may require more intensive algorithmic processing to extract useful streams of values, like beat tracking or pose tracking.

Purposing Gestures

Like the variables in an equation, digital musical objects expose parameters that we can change so we get different sound values out of them. These parameters are often represented physically as buttons, knobs and sliders. Each parameter has some data configuration that can be changed by a gesture configuration acting on a control configuration.

In music computing environments, our sensors are responsible for taking a gesture configuration and transforming it into data to be passed as an update to some parameter. This requires the creation of an interface, or set of controls, to allow us to pass gestures from a certain sensor to a certain parameter. In this way, we can take the output of a physical sensor like a knob and pass it into our digital system (through a wired or wireless connection), and connect it to a suitable parameter or a musical object: a mixer, reverb unit, etc.

Extending Gestures

The ability to sense gestures is the product of the type and number of sensors. Each sensor gives us some dimensionality for expression, and the set of sensors gives us a meta-dimensionality, made dynamic with different mappings to different parameters.

We can think of design in this space as occurring in two places. One is at the sensors themselves, in terms of the way we interact with them, and the signals they can record. The other is the controller that takes these signals and routes them to parameters within our musical system.

We’ll focus here on the first of these: the sensors and the devices that contain them. We find that there is a wide range of different sensor types—for musical purposes: keyboards, pads, for recording different “senses”: the microphone for hearing, the camera for seeing, the touchscreen for touching; for recording within a controlled dimensionality: sliders, knobs, joysticks, buttons, switches; for understanding larger dimensionalities: wearables, MoCap suits, VR controllers.

We see that we have devices that capture raw data: cameras, LIDAR, mics and MoCaps, and devices that serve as controller interfaces: game/VR controllers and MIDI instruments. We can also think of the algorithms we can apply to this data, to transform it from raw to processed data, like Computer Vision over cameras and LIDAR, Gesture Recognition over wearables and smartphones, and Sound Recognition and MIR over mics.

This ability to process data allows us more control over the format that we receive data from sensors, which aids in mapping this data to the parameters of processing units. The advantage of controller interfaces like sliders and knobs is that they have confined ranges, and defined positions. As the types of controllers shift, we’ll need to create ways to process this data such that more complex gestures than turning a knob or moving a slider—things like moving your hand like a conductor, or dancing with your body—are as controllable.


Music is a matter of motion. As we further integrate music with digital machines, we lose out on some of the detailed beauty of music made outside the computer. It is my hope that new capabilities in sensor technology will bring the detail and intricacy of gestures deeper into the digital context, marrying the machine to the human in deeper ways.

Music has a broad set of commonly used bodily metaphors. The most basic bodily experiences we relate to music are rhythm and repetition. We experience these phenomena throughout our inner experience, from our heartbeats and respiration to our gait. We also use a variety of spatial metaphors for music, referred to by Wilkie et al as image schemas. Such musical image schemas include containers, cycles, verticality, balance, the notion of center-periphery, and (in the case of western melodies) a narrative of source-path-goal.
Ethan Hein