Alexandre Défossez AI homepage

About me

Alexandre Défossez
Chief Exploration Officer at Kyutai, the leading lab in AI in Paris, with a strong focus on doing open source and open science in AI. My research focus is on multi-modal LLMs.
Before that, research scientist at FAIR Paris for 3 years, leading the effort for music generation (MusicGen), and co-leaded the development of the AudioCraft framework.
Formerly CIFRE PhD student at FAIR Paris and Sierra at INRIA Paris, under the supervision of Léon Bottou (FAIR), Nicolas Usunier (FAIR) and Francis Bach (INRIA). Studied maths and physics at ENS Paris, and applied maths master deg. (MVA) at ENS Saclay.
[scholar] [github] [twitter] [linked in]

Multimodal LLMs, audio generation, source separation, stochastic optimization, and AI for science.
Also, amateur DJ and composer [artist website].

audionlp Moshi: a speech-text foundation model for real-time dialogue. Preprint 2024. [paper] [code] [demo]
A. Défossez, L. Mazaré, M. Orsini, A. Royer, P. Pérez, H. Jégou, E. Grave, N. Zeghidour
audio Audio Conditioning for Music Generation via Discrete Bottleneck Features. ISMIR 2024. [paper] [code] [samples]
S. Rouard, Y. Adi, J. Copet, A. Roebel, A. Défossez
audio An Independence-promoting Loss for Music Generation with Language Models. ICML 2024. [paper] [code] [samples]
J.M. Lemercier, S. Rouard, J. Copet, Y. Adi, A. Défossez
audio Proactive detection of voice cloning with localized watermarking. ICML 2024. [paper] [code]
R. San Roman, P. Fernandez, A. Défossez, T. Furon, T. Tran, H. Elsahar

audio Simple and Controllable Music Generation. Neurips 2023. [paper] [code] [demo] [samples]
J. Copet, F. Kreuk, I. Gat, T. Remez, D. Kant, G. Synnaeve, Y. Adi, A. Défossez.
neuro Decoding percveied speech from non-invasive brain recordings. Nature Machine Intelligence 2023. [paper] [code]
A. Défossez, C. Caucheteux, J. Rapin, O. Kabeli, J.R. King.
audio From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion. Neurips 2023. [paper] [code] [samples]
R. S. Roman, Y. Adi, A. Deleforge, R. Serizel, G. Synnaeve, A. Défossez.
nlp Code Llama: Open Foundation Models for Code. preprint 2023. [paper] [code]
M. Hassid, T. Remez, T. A. Nguyen, I. Gat, A. Conneau, F. Kreuk, J. Copet, A. Defossez, G. Synnaeve, E. Dupoux, R. Schwartz, Y. Adi.
audio nlp Textually Pretrained Speech Language Models. Neurips 2023. [paper]
B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, J. Liu, T. Remez, J. Rapin, A. Kozhevnikov, I. Evtimov, J. Bitton, M. Bhatt, C. C. Ferrer, A. Grattafiori, W. Xiong, A. Défossez, J. Copet, F. Azhar, H. Touvron, L. Martin, N. Usunier, T. Scialom, G. Synnaeve.
audio Hybrid Transformers for Music Source Separation. ICASSP 2023. [paper] [code]
S. Rouard, F. Massa, A. Défossez.

audio High Fidelity Neural Audio Compression. TMLR 2022. [paper] [code] [samples]
A. Défossez*, J. Copet*, G. Synnaeve**, Y. Adi**.
audio AudioGen: Textually Guided Audio Generation. ICLR 2023. [paper]
F. Kreuk, G. Synnaeve, A. Polyak, U. Singer, A. Défossez, J. Copet, D. Parikh, Y. Taigma, Y. Adi.
theory Differentiable Model Compression via Pseudo Quantization Noise. TMLR 2022. [paper] [code]
A. Défossez*, Y. Adi*, G. Synnaeve.
theory A Simple Convergence Proof of Adam and Adagrad. TMLR 2022. [paper]
A. Défossez, L. Bottou, F. Bach, N. Usunier.
neuro Deep Recurrent Encoder: an end-to-end network to model magnetoencephalography at scale. NBDT 2022. [paper] [code]
O. Chehab, A. Défossez, J.C. Loiseau, A. Gramfort, J.R. King.
audio Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain. Interpseech 2022. [paper]
D. Markovic, A. Défossez, A. Richard.

audio Hybrid Spectrogram and Waveform Source Separation. MDX Workshop, ISMIR 2021. [paper] [code] [samples]
A. Défossez.
audio Real Time Speech Enhancement in the Waveform Domain. Interspeech 2020. [paper] [audio samples] [code]
A. Défossez, G. Synnaeve, Y. Adi.
audio Music Source Separation in the Waveform Domain. Preprint 2019. [paper] [github] [audio samples]
A. Défossez, N. Usunier, L. Bottou, F. Bach.
audio Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed. Preprint 2019. [paper]
A. Défossez, N. Usunier, L. Bottou, F. Bach.
audio Regression versus classification for neural network based audio source localization. WASPAA 2019. [paper]
L. Perotin, A. Défossez, E. Vincent, R. Serizel, A. Guérin
audio SING: Symbol-to-Instrument Neural Generator. NIPS 2018. [paper] [github] [poster] [audio samples] [slides].
A. Défossez, N. Zeghidour, N. Usunier, L. Bottou, F. Bach.
theory AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods. Preprint 2017. [paper]. A. Défossez, F. Bach.
theory Constant step size least-mean-square: Bias-variance trade-offs and optimal sampling distributions. AI Stats 2015. [AI Stats version], [arXiv version]. A. Défossez, F. Bach.

AudioCraft: Comprehensive framework for inference and training of state-of-the-art audio generative models.
BrainMagick: Framework for training decoding models on EEG and MEG data.
EnCodec: state-of-the-art neural audio codec. The best codec around, especially for music at 48 kHz :)
Demucs: Music source separation, winning model from the Sony 2021 MDX challenge. Can separate drums, bass, and vocals from the rest of the accompaniment. Jaime Altozano loves it!
Julius: Efficient implementations of classical Digital Signal Processing algorithms in PyTorch, fully differentiable and with CUDA support. Resampling, FFT based convolutions, FIR low pass filters and decomposition of a signal over multiple frequency bands in the waveform domain are implemented.
Denoiser: Real time speech denoising in the waveform domain. Can be used with Zoom or other VC software with a virtual soundcard (e.g. Soundflower on a Mac). Live demo :)

Gave one lecture on Deep Learning at Scale at Mines Paritech for the PSL week on Large-Scale Machine Learning. The slides and code are available on the lesson github.

Teaching assistant for the Deep Learning: Do-It-Yourself! class at Ecole Normale Superieure:

I wrote my PhD manuscript on the Optimization of Fast Deep Learning Network for Audio Analysis and Synthesis. Half of it is on audio synthesis and source separation, and the other half is on adaptive and stochastic optimization.