A new AI system from Google called “MusicLM” can produce high-fidelity music in any genre from a text description.
In a research paper, Google said “MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modelling task, and it generates music at 24 kHz that remains consistent over several minutes.”
A series of samples that the company developed utilising the model have been posted. The samples, known as MusicCaps, are essentially a dataset of 5.5k music-text pairs with rich text descriptions contributed by human subject matter experts.
MusicLM can generate a musical narrative from a sequence of descriptions, such as “time to meditate,” “time to wake up,” “time to run,” and “time to give 100%.” It can also produce audio based on picture and caption input or imitate a specific instrument in a particular game. The output can last up to several minutes.
The research paper explained “Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text descriptions. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption.”
There have been other projects trying their hands on audio generation like googles own AudioLM and OpenAI’s jukebox. However, these projects have not produced any music due to a lack of training data and technical limitations.