Rodrigo Pozo Lagos

Data Scientist · M.Sc. Candidate · Santiago, Chile

Back

FUSA-Net

Dual-encoder model for image-to-audio and audio-to-image retrieval. M.Sc. thesis, PUC Chile.

Repository

FUSA-Net aligns sheet-music images and audio representations in a shared embedding space.

The system uses contrastive learning to retrieve the matching modality without requiring paired metadata at inference time.

Metrics

Recall@1 66.87%, Recall@10 92.24%, modality gap 0.036

Stack

PyTorchtransformersCCA

Images

Images can be added later under public/projects/.