Making Nature Audible: Visualizing Frog Calls with Machine Learning
"Monitoring biodiversity through sound is both an ecological and computational challenge. Manual surveys do not scale to the volume and complexity of data produced by passive acoustic monitoring, particularly in species-rich Neotropical environments. This project explores how modern machine-listening models can support biodiversity research by automatically identifying frog (anuran) species and summarizing their calling behavior over time.
The work is inspired by ""A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring"" (Cañas et al., Scientific Data, 2023), which introduces the AnuraSet dataset. This dataset reflects real conservation conditions, including overlapping calls, environmental noise, and severe class imbalance, motivating a multi-label classification approach.
Using time–frequency audio representations, I compare a Transformer-based Audio Spectrogram Transformer (AST) against a ResNet-style convolutional baseline. Beyond accuracy, the project emphasizes visual analysis: performance trends under class imbalance, learned feature embeddings, confusion patterns, and temporal activity profiles. These visualizations aim to translate model predictions into ecologically meaningful insights while keeping the analysis transparent and interpretable."