Richard Savery

Shimon Raps

Real-time Robotic Hip-hop


Shimon raps emerged out of an interest in applying our automatic lyric creation system to a real-time model. This system aims to capture many of unique aspects of lingusitic characteristics of hip hop and lyrical flow through rhythm and phrasing. The final system is interactive, allowing a rapper to respond in dialogue with Shimon.

Selected Press and Videos

It’s Robot Versus Human as Shimon Performs Real-Time Rap Battles IEEE Spectrum, 23rd April 2020

Can a Robot Really Freestyle? Freethink Media, 23rd April 2020

Rapping featured on the tracks, Biological inclusion, Children of Two and Do You Hear.


Shimon the Rapper:A Real-Time System for Human-Robot Interactive Rap Battles

International Conference on Computational Creativity, ICCC’2020 (September)

Richard Savery, Lisa Zahray, Gil Weinberg

Abstract: We present a system for real-time lyrical improvisation between a human and a robot in the style of hip hop. Our system takes vocal input from a human rapper, analyzes the semantic meaning, and generates a response that is rapped back by a robot over a musical groove. Previous work with real-time interactive music systems has largely focused on instrumental output, and vocal interactions with robots have been explored, but not in a musical context. Our generative system includes custom methods for censorship, voice, rhythm, and a novel deep learning pipeline based on phoneme embeddings. The rap performances are accompanied by synchronized robotic gestures and mouth movements. Key technical challenges that were overcome in the system are performing with low-latency, dataset censorship, and rhyming. We evaluated several aspects of the system through a survey of videos and sample text output. Analysis of comments showed the overall perception of the system was positive. The model trained on our hip hop dataset was rated significantly higher than our metal dataset in coherence, rhyme quality, and enjoyment. Participants preferred outputs generated by a given input phrase over outputs generated from unknown keywords, indicating that the system successfully relates its output to its input.

All videos used in the study can be viewed here.