![]() However, they receive more annotations than infants, making them implausible models of the ventral stream development. The best quantitative models of these areas are deep neural networks trained with human annotations. This ability is achieved by their ventral visual stream, multiple hierarchically interconnected brain areas. Primates show remarkable ability to recognize objects. They found that unsupervised and self-supervised methods learned representations that are well-aligned to ventral stream (V1, V4, IT) neurons. This paper was just published in PNAS this year and already has > 60 citations. Unsupervised neural network models of the ventral visual stream From Zhuang et al. Of course, this review reflects my research interests (heavily skewed towards a certain flavor of neuro-AI and vision), but I hope it’s useful to many of you who want to see where the field is going. I went through and reviewed this year’s MAIN conference, NeurIPS, CCN, as well as whatever papers and preprints happened to show in my Twitter feed. If it turns out that this representation is aligned to a brain area, this is a win, as self-supervised and unsupervised methods are more biologically plausible than supervised methods. ![]() CLIP is perhaps the most famous multimodal network – it’s trained contrastively.Īll of these methods allow us to learn a representation without the need for pesky supervision. vision, text, audio, etc.) by predicting one from the other, or predicting a common subspace.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |