MidiTok, a convenient tool to tokenize symbolic music for deep neural networks

Nathan Fradet, Jean-Pierre Briot, Fabien Chhel, Amal El Fallah Seghrouchni, Nicolas Gutowski

MidiTok is a Python package designed to transform MIDI files into sequences of tokens, to be used with deep neural networks such as Transformers or RNNs. Similarly to text in the NLP field, symbolic music as to be encoded and transform into sequences of tokens to be used with these networks. However whereas text is simply words put in a specific order, music is constituted of notes with several characteristics that can be represented in several ways and with several precisions. MidiTok was design to take care of this, offering a convenient way and multiple strategies to encode symbolic music, with flexible parameters.

Nathan Fradet is a PhD candidate at LIP6, Sorbonne University-CNRS and Aubay since April 2021. He works on music generation by deep learning networks. More specifically, he works on way to enhance the capabilities of Transformers networks to generate controlled, expressive and pleasant symbolic music for human-machine co-creativity and artificial creativity. One of his key topic is towards the efficiency of Transformers and Attention mechanisms, as these consuming architectures often requires time and memory, where one would want them to be fast and efficient in a musical creative context.