I sometimes trawl arxiv.org for short math papers to read, and occasionally I even blog about them (see: curve complex I and II), though generally my math blog posts arise from interesting talks I’ve seen (see: most of the rest of my math posts). Recently a friend sent me a job listing that would require a Ph.D. in biology or similar, but the real job requirement is an ability to read biology papers. The only related category on arxiv is “quantitative biology,” so I thought I’d try to bring up a short paper and read it and blog about it to see how I do. Any cognitive neuroscientists who might read this, let me know if my reading is correct!
This post is based on the paper “Deep driven fMRI decoding of visual categories” by Michele Svanera, Sergio Benini, Gal Raz, Talma Hendler, Rainer Goebel, and Giancarlo Valente. First, here’s my schematic of the paper:
We’ll read this schematic from top to bottom, left to right.
- On top is the experiment: they had a lot of people watch 5-10 minute movies. The left white arrow indicates that the people were in fMRI machines (I know a fMRI machine does not look like an EEG but that’s the picture you get) and so they have a bunch of data sitting around from that. The right white arrow indicates that they used a computer algorithm (“math!”) to extract information directly from the movies [this is the fc7 data]. So far they haven’t contributed anything new to the literature; just used existing techniques to come up with raw data.
- The orange diagonal arrows are when things get interesting. The fMRI data and fc7 data comes in giant matrices, and they use another math algorithm to come up with a set of “decoding” matrices. Not pictured in schematic: they test these matrices using some of the data.
- The goal is indicated by the green arrows: to use the brain data and these matrices they came up with to reconstruct what people are seeing and classify these things (aka are subjects seeing people’s faces on the screen, or entire human figures?)
Now for a few details on each of the steps.
0. The motivation behind the paper seems to be to link the brain imaging community (those who work the fMRI, EEG, etc. data) with the deep neural network community (computer people) to answer questions that involve both. The main question they have is: how do people associate low-level information like colors, shapes, etc. with semantic concepts like car, person, etc.? Here’s the picture:There’s a lot of work in both communities on answering this question, and this paper uses work from both sides to form a decoder model: with an input of fMRI data, the model spits out predictions about what the subjects are seeing. Specifically, the model is supposed to tell if subjects were looking at human faces or full human figures. This is hard! Those are pretty similar categories.
- The data: they grabbed a bunch of existing data from other experiments, where scientists took 5-10 minute clips from five different movies (side note I would never want to be in these studies because one of the clips was from The Ring 2) and showed them to subjects (ranging from 27 to 74 participants in each movie) and recorded all the fMRI data, which creates a huge three-dimensional datasetevery 3 seconds. Then they threw the movie frames into a computer algorithm (called the faster R-CNN method) which detects objects in the video frames (with varying confidence levels) and spits out a 4096-dimensional vector for each frame. They averaged these vectors over 15 frames so that the two datasets could match up (the movies were shown at 5 frames per second so this makes sense). These vectors form the fc7 data.
- The math: they use an algorithm called Canonical Correlation Analysis (CCA) to spit out two orthogonal matrices U and V which are highly correlated (hence the middle C). Looks like linear algebra with some linear projection! The schematic is . To do this, they took a subset (about 75%) of the fMRI data and the corresponding fc7 data and plugged it into the math. The goal of this step (the training step) is actually to get the helper matrices A and B. To make sure these matrices are a-OK, they used the remaining fMRI data to reconstruct the fc7 data within a reasonable margin of error . Remember U and V are highly (in fact maximally) correlated so that middle arrow actually makes sense in this step (the testing step).
- The result: For one movie, they did the training math step using different subsets of data (they did it 300 times) to make sure those helper matrices A and B are the best possible ones. Then to show that this whole paper does what they want it to do, they do the testing step using the other movies. [The whole point of a decoding method is to predict what people are seeing]. They then try to classify whether subjects see faces or bodies using their method (the fancy fc7 method) and another method (some linear thing) and show that their method is way better at this discriminating task than the other method. Fun caveat that they had to think about: it takes people a little while to react to stimuli, so they had to toss in time-shifts for the fMRI data, and also throw in another regulatory parameter to normalize the data.
Conclusion: their method works on this preliminary result (faces versus bodies)! They want to expand to other movies and other semantic concepts in the future.
General keywords: machine learning, fMRI, linear algebra. Also CCA, faster R-CCN, fc7 but those are keywords for specialists.
My conclusion: this was cool and fun! I like reading new things and learning. I hope you do too!