Machine Learning with Audio Python

Object-Aware Image Augmentation for Audio-Visual Zero-Shot Learning

Abstract: Audio-visual zero-shot learning (ZSL) leverages both video and audio information for model training, aiming to classify new video categories that were not seen during the training. However, ...

The Hacker News

How to Integrate AI into Modern SOC Workflows

The 2025 SANS SOC Survey shows AI use is rising, but many SOCs lack integration, customization, and clear validation ...

Scientific Research Publishing

Early Alzheimer’s Disease Detection from Short Speech Samples Using Lightweight, Interpretable Linguistic Markers ()

Based Detection, Linguistic Biomarkers, Machine Learning, Explainable AI, Cognitive Decline Monitoring Share and Cite: de Filippis, R. and Al Foysal, A. (2025) Early Alzheimer’s Disease Detection from ...

GitHub

AVF-MAE++ : Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning

Abstract: Affective Video Facial Analysis (AVFA) is important for advancing emotion-aware AI, yet the persistent data scarcity in AVFA presents challenges. Recently, the self-supervised learning (SSL) ...

IEEE

Text-Based Audio Retrieval by Learning From Similarities Between Audio Captions

Abstract: This letter proposes to use similarities of audio captions for estimating audio-caption relevances to be used for training text-based audio retrieval systems. Current audio-caption datasets ...

GitHub

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

GLM-TTS is a high-quality text-to-speech (TTS) synthesis system based on large language models, supporting zero-shot voice cloning and streaming inference. This system adopts a two-stage architecture: ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results