About me
- Alexandre Défossez
- Chief Exploration Officer at Kyutai,
the leading lab in AI in Paris, with a strong focus on doing open source and open science in AI. My research focus is on multi-modal LLMs.
- Before that, research scientist at FAIR Paris for 3 years, leading the effort for music generation (MusicGen), and co-leaded the development of the AudioCraft framework.
-
Formerly
CIFRE PhD student at FAIR Paris and
Sierra at INRIA Paris,
under the supervision of Léon Bottou (FAIR), Nicolas Usunier (FAIR) and Francis Bach (INRIA).
Studied maths and physics at ENS Paris, and applied maths master deg. (MVA) at ENS Saclay.
-
[scholar]
[github]
[twitter]
[linked in]
Interests
Multimodal LLMs, audio generation, source separation, stochastic optimization, and AI for science.
Also, amateur DJ and composer [artist website].
Publications
-
audionlp
Moshi: a speech-text foundation model for real-time dialogue.
Preprint 2024.
[paper]
[code]
[demo]
A. Défossez, L. Mazaré, M. Orsini, A. Royer, P. Pérez, H. Jégou, E. Grave, N. Zeghidour
-
audio
Audio Conditioning for Music Generation via Discrete Bottleneck Features.
ISMIR 2024.
[paper]
[code]
[samples]
S. Rouard, Y. Adi, J. Copet, A. Roebel, A. Défossez
-
audio
An Independence-promoting Loss for Music Generation with Language Models.
ICML 2024.
[paper]
[code]
[samples]
J.M. Lemercier, S. Rouard, J. Copet, Y. Adi, A. Défossez
-
audio
Proactive detection of voice cloning with localized watermarking.
ICML 2024.
[paper]
[code]
R. San Roman, P. Fernandez, A. Défossez, T. Furon, T. Tran, H. Elsahar
-
audio
Simple and Controllable Music Generation.
Neurips 2023.
[paper]
[code]
[demo]
[samples]
J. Copet, F. Kreuk, I. Gat, T. Remez, D. Kant, G. Synnaeve, Y. Adi, A. Défossez.
-
neuro
Decoding percveied speech from non-invasive brain recordings.
Nature Machine Intelligence 2023.
[paper]
[code]
A. Défossez, C. Caucheteux, J. Rapin, O. Kabeli, J.R. King.
-
audio
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion.
Neurips 2023.
[paper]
[code]
[samples]
R. S. Roman, Y. Adi, A. Deleforge, R. Serizel, G. Synnaeve, A. Défossez.
-
nlp
Code Llama: Open Foundation Models for Code.
preprint 2023.
[paper]
[code]
M. Hassid, T. Remez, T. A. Nguyen, I. Gat, A. Conneau, F. Kreuk, J. Copet, A. Defossez, G. Synnaeve, E. Dupoux, R. Schwartz, Y. Adi.
-
audio
nlp
Textually Pretrained Speech Language Models.
Neurips 2023.
[paper]
B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, J. Liu, T. Remez, J. Rapin, A. Kozhevnikov, I. Evtimov, J. Bitton, M. Bhatt, C. C. Ferrer, A. Grattafiori, W. Xiong, A. Défossez, J. Copet, F. Azhar, H. Touvron, L. Martin, N. Usunier, T. Scialom, G. Synnaeve.
-
audio
Hybrid Transformers for Music Source Separation.
ICASSP 2023.
[paper]
[code]
S. Rouard, F. Massa, A. Défossez.
-
audio
High Fidelity Neural Audio Compression.
TMLR 2022.
[paper]
[code]
[samples]
A. Défossez*, J. Copet*, G. Synnaeve**, Y. Adi**.
-
audio
AudioGen: Textually Guided Audio Generation.
ICLR 2023.
[paper]
F. Kreuk, G. Synnaeve, A. Polyak, U. Singer, A. Défossez, J. Copet, D. Parikh, Y. Taigma, Y. Adi.
-
theory
Differentiable Model Compression via Pseudo Quantization Noise.
TMLR 2022.
[paper]
[code]
A. Défossez*, Y. Adi*, G. Synnaeve.
-
theory
A Simple Convergence Proof of Adam and Adagrad. TMLR 2022.
[paper]
A. Défossez, L. Bottou, F. Bach, N. Usunier.
-
neuro
Deep Recurrent Encoder: an end-to-end network to model magnetoencephalography at scale.
NBDT 2022.
[paper]
[code]
O. Chehab, A. Défossez, J.C. Loiseau, A. Gramfort, J.R. King.
-
audio
Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain.
Interpseech 2022.
[paper]
D. Markovic, A. Défossez, A. Richard.
-
audio
Hybrid Spectrogram and Waveform Source Separation.
MDX Workshop, ISMIR 2021.
[paper]
[code]
[samples]
A. Défossez.
-
audio
Real Time Speech Enhancement in the Waveform Domain.
Interspeech 2020.
[paper]
[audio samples]
[code]
A. Défossez, G. Synnaeve, Y. Adi.
-
audio
Music Source Separation in the Waveform Domain. Preprint 2019.
[paper]
[github]
[audio samples]
A. Défossez, N. Usunier, L. Bottou, F. Bach.
-
audio
Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed. Preprint 2019.
[paper]
A. Défossez, N. Usunier, L. Bottou, F. Bach.
-
audio
Regression versus classification for neural network based audio source localization.
WASPAA 2019.
[paper]
L. Perotin, A. Défossez, E. Vincent, R. Serizel, A. Guérin
-
audio
SING:
Symbol-to-Instrument Neural Generator. NIPS 2018.
[paper]
[github]
[poster] [audio
samples]
[slides].
A. Défossez, N. Zeghidour, N. Usunier, L. Bottou, F. Bach.
-
theory
AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel
Stochastic Gradient Methods. Preprint 2017.
[paper].
A. Défossez, F. Bach.
-
theory
Constant step size least-mean-square: Bias-variance trade-offs and optimal
sampling distributions.
AI Stats 2015.
[AI Stats version],
[arXiv version].
A. Défossez, F. Bach.
Software
- AudioCraft:
Comprehensive framework for inference and training of state-of-the-art audio generative models.
- BrainMagick:
Framework for training decoding models on EEG and MEG data.
- EnCodec:
state-of-the-art neural audio codec. The best codec around, especially
for music at 48 kHz :)
- Demucs:
Music source separation, winning model from the Sony 2021 MDX challenge. Can separate drums, bass, and vocals
from the rest of the accompaniment. Jaime Altozano loves it!
- Julius:
Efficient implementations of classical Digital Signal Processing algorithms in PyTorch,
fully differentiable and with CUDA support. Resampling, FFT based convolutions,
FIR low pass filters and decomposition of a signal over multiple frequency bands in the
waveform domain are implemented.
- Denoiser:
Real time speech denoising in the waveform domain. Can be used with Zoom or other
VC software with a virtual soundcard (e.g. Soundflower on a Mac). Live demo :)
Teaching
2022
Gave one lecture on Deep Learning at Scale at Mines Paritech for the PSL week on Large-Scale Machine Learning.
The slides and code are available on the lesson github.
2018
Teaching assistant for the
Deep Learning: Do-It-Yourself!
class at Ecole Normale Superieure:
Misc.
I wrote my PhD manuscript on the
Optimization of Fast Deep Learning Network for Audio Analysis and Synthesis.
Half of it is on audio synthesis and source separation, and the other half is on adaptive and
stochastic optimization.