Up till now, most generative music fashions have been producing mono sound. This implies MusicGen doesn’t place any sounds or devices on the left or proper aspect, leading to a much less full of life and thrilling combine. The rationale why stereo sound has been largely ignored thus far is that producing stereo is just not a trivial process.
As musicians, after we produce stereo indicators, we have now entry to the person instrument tracks in our combine and we are able to place them wherever we wish. MusicGen doesn’t generate all devices individually however as a substitute produces one mixed audio sign. With out entry to those instrument sources, creating stereo sound is tough. Sadly, splitting an audio sign into its particular person sources is a tricky drawback (I’ve revealed a weblog put up about that) and the tech remains to be not 100% prepared.
Subsequently, Meta determined to include stereo era immediately into the MusicGen mannequin. Utilizing a brand new dataset consisting of stereo music, they skilled MusicGen to supply stereo outputs. The researchers declare that producing stereo has no further computing prices in comparison with mono.
Though I really feel that the stereo process is just not very clearly described within the paper, my understanding it really works like this (Determine 3): MusicGen has discovered to generate two compressed audio indicators (left and proper channel) as a substitute of 1 mono sign. These compressed indicators should then be decoded individually earlier than they’re mixed to construct the ultimate stereo output. The rationale this course of doesn’t take twice as lengthy is that MusicGen can now produce two compressed audio indicators at roughly the identical time it beforehand took for one sign.
Having the ability to produce convincing stereo sound actually units MusicGen other than different state-of-the-art fashions like MusicLM or Steady Audio. From my perspective, this “little” addition makes an enormous distinction within the liveliness of the generated music. Hear for yourselves (is perhaps laborious to listen to on smartphone audio system):
MusicGen was spectacular from the day it was launched. Nonetheless, since then, Meta’s FAIR group has been frequently enhancing their product, enabling greater high quality outcomes that sound extra genuine. In terms of text-to-music fashions producing audio indicators (not MIDI and so on.), MusicGen is forward of its opponents from my perspective (as of November 2023).
Additional, since MusicGen and all its associated merchandise (EnCodec, AudioGen) are open-source, they represent an unimaginable supply of inspiration and a go-to framework for aspiring AI audio engineers. If we have a look at the enhancements MusicGen has made in solely 6 months, I can solely think about that 2024 might be an thrilling yr.
One other necessary level is that with their clear method, Meta can be doing foundational work for builders who need to combine this know-how into software program for musicians. Producing samples, brainstorming musical concepts, or altering the style of your current work — these are among the thrilling functions we’re already beginning to see. With a ample stage of transparency, we are able to make sure that we’re constructing a future the place AI makes creating music extra thrilling as a substitute of being solely a menace to human musicianship.