Publications

Journal Articles

Lamothe, C., Obliger-Debouche, M., Best, P., Trapeau, R., Ravel, S., Artières, T., Marxer, R., & Belin, P. (2025). A large annotated dataset of vocalizations by common marmosets. Scientific Data , 12. https://doi.org/10.1038/s41597-025-04951-8 Link
Kalda, J., Baroudi, S., Lebourdais, M., Pagés, C., Marxer, R., Alumäe, T., & Bredin, H. (2025). Design Choices for PixIT-based Speaker-Attributed ASR: Team ToTaTo at the NOTSOFAR-1 Challenge. Computer Speech and Language, 95, 101824. https://doi.org/10.1016/j.csl.2025.101824 Link
Kershenbaum, A., Akçay, Ç., Babu-Saheer, L., Barnhill, A., Best, P., Cauzinille, J., Clink, D., Dassow, A., Dufourq, E., Growcott, J., Markham, A., Marti-Domken, B., Marxer, R., Muir, J., Reynolds, S., Root-Gutteridge, H., Sadhukhan, S., Schindler, L., Smith, B., … Dunn, J. (2024). Automatic detection for bioacoustic research: a practical guide from and for biologists and computer scientists. Biological Reviews. https://doi.org/10.1111/brv.13155 Link
Cauzinille, J., Favre, B., Marxer, R., & Rey, A. (2024). Applying machine learning to primate bioacoustics: review and perspectives. American Journal of Primatology. https://doi.org/10.1002/ajp.23666 Link
Chetouani, M., Briefer, E., Dassow, A., Marxer, R., Moore, R., Obin, N., & Stowell, D. (2023). Vocal interactivity in-and-between humans, animals and robots. Interaction Studies, 24(1), 1–4. https://doi.org/10.1075/is.00016.che Link
Best, P., Paris, S., Glotin, H., & Marxer, R. (2023). Deep audio embeddings for vocalisation clustering. PLoS ONE, 18(7), e0283396. https://doi.org/10.1371/journal.pone.0283396 Link
Boittiaux, C., Dune, C., Ferrera, M., Arnaubec, A., Marxer, R., Matabos, M., Van Audenhaege, L., & Hugel, V. (2023). Eiffel Tower: A Deep-Sea Underwater Dataset for Long-Term Visual Localization. The International Journal of Robotics Research. https://doi.org/10.1177/02783649231177322 Link
Best, P., Marxer, R., Paris, S., & Glotin, H. (2022). Temporal evolution of the Mediterranean fin whale song. Scientific Reports, 12(1), 13565. https://doi.org/10.1038/s41598-022-15379-0 Link
Boittiaux, C., Marxer, R., Dune, C., Arnaubec, A., & Hugel, V. (2022). Homography-Based Loss Function for Camera Pose Regression. IEEE Robotics and Automation Letters, 7(3), 6242–6249. https://doi.org/10.1109/LRA.2022.3168329 Link
Cooke, M., García Lecumberri, M. L., Barker, J., & Marxer, R. (2019). Lexical frequency effects in English and Spanish word misperceptions. Journal of the Acoustical Society of America, 145(2), EL136–EL141. https://doi.org/10.1121/1.5090196 Link
Marxer, R., Barker, J., Alghamdi, N., & Maddock, S. (2018). The impact of the Lombard effect on audio and visual speech recognition systems. Speech Communication, 100, 58–68. https://doi.org/10.1016/j.specom.2018.04.006 Link
Alghamdi, N., Maddock, S., Marxer, R., Barker, J., & Brown, G. (2018). A corpus of audio-visual Lombard speech with frontal and profile views. Journal of the Acoustical Society of America, 143(6), EL523–EL529. https://doi.org/10.1121/1.5042758 Link
Barker, J., Marxer, R., Vincent, E., & Watanabe, S. (2017). The third ’CHIME’ speech separation and recognition challenge: Analysis and outcomes. Computer Speech and Language, 46, 605–626. https://doi.org/10.1016/j.csl.2016.10.005 Link
Barker, J., Marxer, R., Vincent, E., & Watanabe, S. (2017). Multi-microphone speech recognition in everyday environments. Computer Speech and Language, 46, 386–387. https://doi.org/10.1016/j.csl.2017.02.007 Link
Vincent, E., Watanabe, S., Nugraha, A. A., Barker, J., & Marxer, R. (2017). An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Computer Speech and Language, 46, 535–557. https://doi.org/10.1016/j.csl.2016.11.005 Link
Moore, R., Marxer, R., & Thill, S. (2016). Vocal Interactivity in-and-between Humans, Animals, and Robots. Frontiers in Robotics and AI, 3, 1–1. https://doi.org/10.3389/frobt.2016.00061 Link
Marxer, R., & Purwins, H. (2016). Unsupervised Incremental Online Learning and Prediction of Musical Audio Signals. IEEE/ACM Transactions on Audio, Speech and Language Processing, 24(5), 863–874. https://doi.org/10.1109/TASLP.2016.2530409 Link
Hazan, A., Marxer, R., Brossier, P., Purwins, H., Herrera, P., & Serra, X. (2010). What/when causal expectation modelling applied to audio signals. Connection Science, 21(2-3), 119–143. https://doi.org/10.1080/09540090902733764 Link
Purwins, H., Herrera, P., Grachten, M., Hazan, A., Marxer, R., & Serra, X. (2008). Computational models of music perception and cognition I: The perceptual and cognitive processing chain. Physics of Life Reviews, 5(3), 151–168. https://doi.org/10.1016/j.plrev.2008.03.004 Link
Purwins, H., Grachten, M., Herrera, P., Hazan, A., Marxer, R., & Serra, X. (2008). Computational models of music perception and cognition II: Domain-specific music processing. Physics of Life Reviews, 5(3), 169–182. https://doi.org/10.1016/j.plrev.2008.03.005 Link

Book Chapters

Barker, J., Marxer, R., Vincent, E., & Watanabe, S. (2017). The CHiME challenges: Robust speech recognition in everyday environments. In New era for robust speech recognition - Exploiting deep learning (pp. 327–344). Springer. https://inria.hal.science/hal-01383263

Conference Articles

Deowan, M. E., Yousha, M. S. Y., Hossain, T. M., Hassan, S., & Marxer, R. (2025, May). Optimizing Underwater Robot Navigation: A Study of DRL Algorithms and Multi-Modal Sensor Fusion. IEEE International Conference on Robotics & Automation (ICRA). https://hal.science/hal-05004039
Cuervo, S., & Marxer, R. (2024). Scaling Properties of Speech Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 351–361. https://doi.org/10.18653/v1/2024.emnlp-main.21 Link
Best, P., Cuervo, S., & Marxer, R. (2024). Transfer Learning from Whisper for Microscopic Intelligibility Prediction. Interspeech 2024, 3839–3843. https://doi.org/10.21437/Interspeech.2024-2258 Link
Kalda, J., Alumae, T., Lebourdais, M., Bredin, H., Baroudi, S., & Marxer, R. (2024). TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 2024. Interspeech 2024, 1635–1639. https://doi.org/10.21437/interspeech.2024-2462 Link
Cauzinille, J., Favre, B., Marxer, R., Clink, D., Ahmad, A. H., & Rey, A. (2024). Investigating self-supervised speech models’ ability to classify animal vocalizations: The case of gibbon’s vocal signatures. Interspeech 2024, 132–136. https://doi.org/10.21437/Interspeech.2024-1096 Link
Kalda, J., Alumae, T., Baroudi, S., Lebourdais, M., Bredin, H., & Marxer, R. (2024). ToTaTo System Descriptions for the NOTSOFAR1 Challenge. 8th International Workshop on Speech Processing in Everyday Environments (CHiME 2024), 23–25. https://doi.org/10.21437/CHiME.2024-5 Link
Kalda, J., Pagés, C., Marxer, R., Alumäe, T., & Bredin, H. (2024). PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings. The Speaker and Language Recognition Workshop (Odyssey 2024), 115–122. https://doi.org/10.21437/odyssey.2024-17 Link
Cauzinille, J., Favre, B., Marxer, R., & Rey, A. (2024). From speech to primate vocalizations: self-supervised deep learning as a comparative approach. Proceedings of the 15th International Conference on the Evolution of Language (EVOLANG XV), 15, 64. https://doi.org/10.17617/2.3587960 Link
Cuervo, S., & Marxer, R. (2024). Speech Foundation Models on Intelligibility Prediction for Hearing-Impaired Listeners. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1421–1425. https://doi.org/10.1109/ICASSP48485.2024.10447907 Link
Boittiaux, C., Marxer, R., Dune, C., Arnaubec, A., Ferrera, M., & Hugel, V. (2024). SUCRe: Leveraging Scene Structure for Underwater Color Restoration. 2024 International Conference on 3D Vision (3DV), 1488–1497. https://doi.org/10.1109/3DV62453.2024.00148 Link
Cuervo, S., & Marxer, R. (2023). On the Benefits of Self-supervised Learned Speech Representations for Predicting Human Phonetic Misperceptions. INTERSPEECH 2023, 1788–1792. https://doi.org/10.21437/Interspeech.2023-1476 Link
Moore, R. K., & Marxer, R. (2023). Progress and Prospects for Spoken Language Technology: Results from Five Sexennial Surveys. INTERSPEECH 2023, 401–405. https://doi.org/10.21437/Interspeech.2023-235 Link
Sanz, P., Marín, R., López-Barajas, S., Solis, A., Marxer, R., & Hugel, V. (2023). 1st Year of running MIR at UJI. OCEANS 2023 - Limerick, 1–5. https://doi.org/10.1109/OCEANSLimerick52467.2023.10244270 Link
Boittiaux, C., Dune, C., Arnaubec, A., Marxer, R., Ferrera, M., & Hugel, V. (2023, May). Long-term visual localization in deep-sea underwater environment. ORASIS. https://hal.science/hal-04108737
Cuervo, S., Łańcucki, A., Marxer, R., Rychlikowski, P., & Chorowski, J. (2022). Variable-rate hierarchical CPC leads to acoustic unit discovery in speech. Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35, 34995–35006. https://hal.science/hal-04093636
Hafsati, M., Bentounes, K., & Marxer, R. (2022). Blind Speech Separation Through Direction of Arrival Estimation Using Deep Neural Networks with a Flexibility on the Number of Speakers. 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), 1–5. https://doi.org/10.1109/MMSP55362.2022.9949050 Link
Cuervo, S., Grabias, M., Chorowski, J., Ciesielski, G., Lancucki, A., Rychlikowski, P., & Marxer, R. (2022). Contrastive Prediction Strategies for Unsupervised Segmentation and Categorization of Phonemes and Words. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3189–3193. https://doi.org/10.1109/ICASSP43922.2022.9746102 Link
Marxer, R., Hugel, V., Prud’Homme, K. P., Batista, P., Aviles, J. V. M., Pascoal, A., Sanz, P., & Schjolberg, I. (2021, September). Marine and Maritime Intelligent Robotics (MIR). OCEANS 2021: San Diego – Porto. https://doi.org/10.23919/OCEANS44145.2021.9706122 Link
Chorowski, J., Ciesielski, G., Dzikowski, J., Łańcucki, A., Marxer, R., Opala, M., Pusz, P., Rychlikowski, P., & Stypulkowski, M. (2021). Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw. Interspeech 2021, 971–975. https://doi.org/10.21437/Interspeech.2021-1465 Link
Chorowski, J., Ciesielski, G., Dzikowski, J., Łańcucki, A., Marxer, R., Opala, M., Pusz, P., Rychlikowski, P., & Stypulkowski, M. (2021). Aligned Contrastive Predictive Coding. Interspeech 2021, 976–980. https://doi.org/10.21437/interspeech.2021-1544 Link
Hernaez, I., González-López, J. A., Navas, E., Pérez Córdoba, J. L., Saratxaga, I., Olivares, G., Sánchez de La Fuente, J., Galdón, A., García Romillo, V., González-Atienza, M., Schultz, T., Green, P., Wand, M., Marxer, R., & Diener, L. (2021). Voice Restoration with Silent Speech Interfaces (ReSSInt). IberSPEECH 2021, 130–134. https://doi.org/10.21437/IberSPEECH.2021-28 Link
Ferrari, M., Glotin, H., Oger, M., Marxer, R., Asch, M., Gies, V., & Sarano, F. (2020). 3D diarization of a sperm whale click cocktail party by an ultra high sampling rate portable hydrophone array for assessing individual cetacean growth curves. Forum Acusticum, 3239–3243. https://doi.org/10.48465/fa.2020.1097 Link
Ferrari, M., Glotin, H., & Marxer, R. (2020). End to end raw audio deep learning of transients, application to bioacoustics. e-Forum Acusticum 2020, 3245–3247. https://doi.org/10.48465/fa.2020.1096 Link
Best, P., Marzetti, S., Poupard, M., Ferrari, M., Paris, S., Marxer, R., Philippe, O., Gies, V., Barchasz, V., & Glotin, H. (2020). Stereo to five-channels bombyx sonobuoys: from four years cetacean monitoring to real-time whale-ship anti-collision system. e-Forum Acusticum 2020, 3229–3231. https://doi.org/10.48465/fa.2020.1089 Link
Khurana, S., Laurent, A., Hsu, W.-N., Chorowski, J., Łańcucki, A., Marxer, R., & Glass, J. (2020, October). A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning. Interspeech 2020. https://hal.science/hal-02912029
Dolfing, H. J. G. A., Jérome, B., Chorowski, J., Marxer, R., & Laurent, A. (2020, September). The ”ScribbleLens” Dutch historical handwriting corpus. International Conference on Frontiers of Handwriting Recognition (ICFHR). https://hal.science/hal-02877520
Sanchez, G., Guis, V., Marxer, R., & Bouchara, F. (2020, July). Deep learning classification with noisy labels. ICME Workshop. https://hal.science/hal-02552375
Łańcucki, A., Chorowski, J., Sanchez, G., Marxer, R., Chen, N., Dolfing, H. J. G. A., Khurana, S., Alumäe, T., & Laurent, A. (2020, July). Robust Training of Vector Quantized Bottleneck Models. IJCNN 2020. https://hal.science/hal-02912027
Ferrari, M., Glotin, H., Marxer, R., & Asch, M. (2020). DOCC10: Open access dataset of marine mammal transient studies and end-to-end CNN classification. IJCNN, 1–8. https://doi.org/10.1109/IJCNN48605.2020.9207085 Link
Best, P., Ferrari, M., Poupard, M., Paris, S., Marxer, R., Symonds, H., Spong, P., & Glotin, H. (2020, July). Deep Learning and Domain Transfer for Orca Vocalization Detection. International Joint Conference on Neural Networks. https://hal.science/hal-02865300
Chorowski, J., Chen, N., Marxer, R., Dolfing, H. J. G. A., Łańcucki, A., Sanchez, G., Alumäe, T., & Laurent, A. (2019, December). Unsupervised Neural Segmentation and Clustering for Unit Discovery in Sequential Data. NeurIPS 2019 Workshop - Perception as Generative Reasoning - Structure, Causality, Probability. https://hal.science/hal-02399138
Ferrari, M., Marxer, R., Asch, M., & Glotin, H. (2019, August). Wave Propagation in the Biosonar Organ of sperm whales using a Finite Difference Time Domain method. VIHAR. https://hal.science/hal-02445408
Ferrari, M., Glotin, H., Marxer, R., Barchasz, V., Sarano, V., Gies, V., Asch, M., & Sarano, F. (2019, June). High-frequency Near-field Physeter macrocephalus Monitoring by Stereo-Autoencoder and 3D Model of Sonar Organ. OCEANS 2019. https://hal.science/hal-02313898
Ferrari, M., Poupard, M., Giraudet, P., Marxer, R., Prévot, J.-M., Soriano, T., & Glotin, H. (2019). Efficient artifacts filter by density-based clustering in long term 3D whale passive acoustic monitoring with five hydrophones fixed under an Autonomous Surface Vehicle. OCEANS 2019, 2, 39. https://doi.org/10.1109/OCEANSE.2019.8867416 Link
Gogate, M., Adeel, A., Marxer, R., Barker, J., & Hussain, A. (2018). DNN Driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation. Interspeech 2018, 2723–2727. https://doi.org/10.21437/Interspeech.2018-2516 Link
Marxer, R., & Barker, J. (2017). Binary Mask Estimation Strategies for Constrained Imputation-Based Speech Enhancement. Interspeech 2017, 1988–1992. https://doi.org/10.21437/Interspeech.2017-1257 Link
Moore, R., & Marxer, R. (2016). Progress and Prospects for Spoken Language Technology: Results from Four Sexennial Surveys. Interspeech 2016, 3012–3016. https://doi.org/10.21437/Interspeech.2016-948 Link
Lecumberri, M. L. G., Barker, J., Marxer, R., & Cooke, M. (2016). Language Effects in Noise-Induced Word Misperceptions. Interspeech 2016, 640–644. https://doi.org/10.21437/Interspeech.2016-330 Link
Green, P., Marxer, R., Cunningham, S., Christensen, H., Rudzicz, F., Yancheva, M., Coy, A., Malavasi, M., Desideri, L., & Tamburini, F. (2016). CloudCAST - Remote Speech Technology for Speech Professionals. Interspeech 2016, 1608–1612. https://doi.org/10.21437/Interspeech.2016-148 Link
Ma, N., Marxer, R., Barker, J., & Brown, G. (2015). Exploiting synchrony spectra and deep neural networks for noise-robust automatic speech recognition. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 490–495. https://doi.org/10.1109/ASRU.2015.7404835 Link
Barker, J., Marxer, R., Vincent, E., & Watanabe, S. (2015, December). The third ‘CHiME’ Speech Separation and Recognition Challenge: Dataset, task and baselines. 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015). https://inria.hal.science/hal-01211376
Barker, J., Marxer, R., Vincent, E., & Watanabe, S. (2015). The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 504–511. https://doi.org/10.1109/ASRU.2015.7404837 Link
Casanueva, I., Hain, T., Christensen, H., Marxer, R., & Green, P. (2015). Knowledge transfer between speakers for personalised dialogue management. Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 12–21. https://doi.org/10.18653/v1/W15-4603 Link
Marxer, R., Cooke, M., & Barker, J. (2015). A framework for the evaluation of microscopic intelligibility models. Interspeech 2015, 2558–2562. https://doi.org/10.21437/Interspeech.2015-551 Link
Marxer, R., & Janer, J. (2012). A Tikhonov regularization method for spectrum decomposition in low latency audio source separation. ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, 277–280. https://doi.org/10.1109/ICASSP.2012.6287871 Link
Janer, J., Marxer, R., & Arimoto, K. (2012). Combining a harmonic-based NMF decomposition with transient analysis for instantaneous percussion separation. ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, 281–284. https://doi.org/10.1109/ICASSP.2012.6287872 Link

Miscellaneous

Chetouani, M., Mandel-Briefer, E., Dassow, A., Marxer, R., Moore, R. K., Obin, N., & Stowell, D. (2023). Vocal Interactivity in-and-between Humans, Animals and Robots (Vol. 24, Number 1). John Benjamins Publishing Co. https://doi.org/10.1075/is.24.1 Link
Ferrari, M., Marxer, R., Roger, V., Gies, V., Sarano, F., Asch, M., Vitry, H., Homme, A. P., Heuzey, R., Sarano, V., & Glotin, H. (2018). Sperm whales ultra high frequency near field multichannel analysis. The 8th International Workshop on Detection, Classification, Localization, and Density Estimation (DCLDE). https://hal.science/hal-01881615

Books

Chetouani, M., Mandel-Briefer, E., Dassow, A., Marxer, R., Moore, R., Obin, N., & Stowell, D. (2021). Proceedings of the 3rd International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR 2021). https://hal.science/hal-03429487
Dassow, A., Marxer, R., Moore, R., & Stowell, D. (2019). Proceedings of the 2nd International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR 2019). https://hal.science/hal-03609831
Dassow, A., Marxer, R., & Moore, R. (2017). Proceedings of the 1st International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR 2017). https://hal.science/hal-03609819