Tag Archives: linear algebra

And now for something completely different-cognitive neuroscience!

10 Jan

I sometimes trawl arxiv.org for short math papers to read, and occasionally I even blog about them (see: curve complex I and II), though generally my math blog posts arise from interesting talks I’ve seen (see: most of the rest of my math posts).  Recently a friend sent me a job listing that would require a Ph.D. in biology or similar, but the real job requirement is an ability to read biology papers.  The only related category on arxiv is “quantitative biology,” so I thought I’d try to bring up a short paper and read it and blog about it to see how I do.  Any cognitive neuroscientists who might read this, let me know if my reading is correct!

This post is based on the paper “Deep driven fMRI decoding of visual categories” by Michele Svanera, Sergio Benini, Gal Raz, Talma Hendler, Rainer Goebel, and Giancarlo Valente.  First, here’s my schematic of the paper:


We’ll read this schematic from top to bottom, left to right.

  1. On top is the experiment: they had a lot of people watch 5-10 minute movies.  The left white arrow indicates that the people were in fMRI machines (I know a fMRI machine does not look like an EEG but that’s the picture you get) and so they have a bunch of data sitting around from that.  The right white arrow indicates that they used a computer algorithm (“math!”) to extract information directly from the movies [this is the fc7 data].  So far they haven’t contributed anything new to the literature; just used existing techniques to come up with raw data.
  2. The orange diagonal arrows are when things get interesting.  The fMRI data and fc7 data comes in giant matrices, and they use another math algorithm to come up with a set of “decoding” matrices.  Not pictured in schematic: they test these matrices using some of the data.
  3. The goal is indicated by the green arrows: to use the brain data and these matrices they came up with to reconstruct what people are seeing and classify these things (aka are subjects seeing people’s faces on the screen, or entire human figures?)

Now for a few details on each of the steps.

0. The motivation behind the paper seems to be to link the brain imaging community (those who work the fMRI, EEG, etc. data) with the deep neural network community (computer people) to answer questions that involve both.  The main question they have is: how do people associate low-level information like colors, shapes, etc. with semantic concepts like car, person, etc.?  Here’s the picture:


Eyes see a vague shape + different colors [low-level information]; brain tells us whether it’s a person or a tree with the sun behind it [semantic concepts]

There’s a lot of work in both communities on answering this question, and this paper uses work from both sides to form a decoder model: with an input of fMRI data, the model spits out predictions about what the subjects are seeing.  Specifically, the model is supposed to tell if subjects were looking at human faces or full human figures.  This is hard!  Those are pretty similar categories.

  1. The data: they grabbed a bunch of existing data from other experiments, where scientists took 5-10 minute clips from five different movies (side note I would never want to be in these studies because one of the clips was from The Ring 2) and showed them to subjects (ranging from 27 to 74 participants in each movie) and recorded all the fMRI data, which creates a huge three-dimensional datasetevery 3 seconds.  Then they threw the movie frames into a computer algorithm (called the faster R-CNN method) which detects objects in the video frames (with varying confidence levels) and spits out a 4096-dimensional vector for each frame.  They averaged these vectors over 15 frames so that the two datasets could match up (the movies were shown at 5 frames per second so this makes sense).  These vectors form the fc7 data.
  2. The math: they use an algorithm called Canonical Correlation Analysis (CCA) to spit out two orthogonal matrices and which are highly correlated (hence the middle C).  Looks like linear algebra with some linear projection!  The schematic is fMRI \cdot A = U \\ fc7 \cdot B = V.  To do this, they took a subset (about 75%) of the fMRI data and the corresponding fc7 data and plugged it into the math.  The goal of this step (the training step) is actually to get the helper matrices and B.  To make sure these matrices are a-OK, they used the remaining fMRI data to reconstruct the fc7 data within a reasonable margin of error fMRI \cdot A = U \rightarrow V \cdot B^{-1} = fc7.  Remember U and V are highly (in fact maximally) correlated so that middle arrow actually makes sense in this step (the testing step).
  3. The result: For one movie, they did the training math step using different subsets of data (they did it 300 times) to make sure those helper matrices and are the best possible ones.  Then to show that this whole paper does what they want it to do, they do the testing step using the other movies.  [The whole point of a decoding method is to predict what people are seeing].  They then try to classify whether subjects see faces or bodies using their method (the fancy fc7 method) and another method (some linear thing) and show that their method is way better at this discriminating task than the other method.  Fun caveat that they had to think about: it takes people a little while to react to stimuli, so they had to toss in time-shifts for the fMRI data, and also throw in another regulatory parameter to normalize the data.

Conclusion: their method works on this preliminary result (faces versus bodies)!  They want to expand to other movies and other semantic concepts in the future.

General keywords: machine learning, fMRI, linear algebra.  Also CCA, faster R-CCN, fc7 but those are keywords for specialists.

My conclusion: this was cool and fun!  I like reading new things and learning.  I hope you do too!


A Linear Algebra Exercise

13 Aug

Back when I went to that excellent MSRI graduate summer school (video of lectures available at the link), one professor had us try a few seemingly-random exercises in order to motivate some definitions.  The exercises by themselves were pretty fun, so I thought I’d share them and maybe get to the definitions (which I have since used in my research).

Linear Algebra is really hard, and it’s one of the first courses you take after calculus (if you take courses after calculus).  I’m very often surprised by how often I use it and how applicable it is (I’m pretty sure my husband also uses it often).  A few months after I first saw an eigenvalue, I ran into an old friend who told me “I still don’t know what eigenvalues are useful for.”  Fast forward five or six years, and now I see eigenvalues EVERYWHERE!  My senior seminar of undergraduate was basically a semester all about eigenvalues (spectral analysis).  Anyways, this is all to say I still don’t feel very proficient at linear algebra, but I am very appreciative of it.  I hope you feel that way about lots of math/baking too!

In these exercises, we’ll live in vector spaces.  We’ll do a “definition” by example here: Euclidean space, or real space, is an example of a vector space.  Instead of thinking of coordinates as points in space, we think of them as little arrows that point to a point and have a specific length/magnitude (the distance of the point to the origin).

The purple point is (2,3).  The orange vector has length (square root of 13) and is pointing at the point (2,3).

The purple point is (2,3). The orange vector has magnitude (square root of 13) and is pointing at the point (2,3).

Just as groups are the fundamental object of study in algebra, vectors are the fundamental object of study in linear algebra. Vectors are abstract sets with two operations (instead of one, which groups have): you can add vectors and you can scalar multiply them (which means make them longer or shorter).  There are a bunch of axioms that vector spaces have to satisfy, but you can just think of arrows in Euclidean space.

Next we need to understand the concept of an inner product.  This is an operation where you put in two vectors and get out a number (in our case, real, but generally complex or in any “field” in quotes because I haven’t ever defined what a field is).  In our example, if you have one vector that points to (2,3) and another that points to (9,11), you could do 2*9 + 3*11 = 18+ 33 = 51 and get 51 as the inner product of the two.  This is written \langle (2,3), (9,11) \rangle =51.  So we multiply all the matching coordinates, and then we add up all those products.

There’s a few axioms that inner products satisfy:

  1. Conjugate symmetry: \langle b, a \rangle = \bar{\langle a, b \rangle}  In our case, since we’re in the reals and not the complex numbers, we can ignore the bar and read this as symmetry.  From our definition this is true.
  2. Positive-definiteness: \langle a, a \rangle \geq 0 for any a, and equals 0 only if a is 0.  In our case, \langle a, a \rangle will be a sum of squares, which is only 0 if every single coordinate is 0.
  3. Linearity in the first argument: \langle ax+c, b\rangle = x\langle a, b \rangle + \langle c, b \rangle.  You can/should check this is true for our example.

From our definition of inner product over Euclidean space, we can come up with a formula for the magnitude of a vector (remember, that’s the distance from the origin to the point that we’re pointing at).   We have \langle (a_1, a_2, \ldots, a_n), (a_1,\ldots, a_n) \rangle = a_1^2+a_2^2+\ldots+a_n^2.  By the Pythagorean theorem, we already know the distance of a point to the origin: it’s the square root of this quantity!  So we have that the magnitude of a vector is the square root of its inner product with itself.  In symbols, we write \| a \| = \sqrt{\langle a, a \rangle}.  Any time you have an inner product, you can define a norm \| \cdot \| in this way.

Here’s the exercise: let’s say you have a bunch of unit vectors (vectors with magnitude 1) and you take all of the pairwise inner products.  So if you have four unit vectors a, b, c, d, you’ll have all these inner products, which we’ll arrange in a matrix:

\left( \begin{array}{cccc} \langle a, a \rangle & \langle a, b \rangle & \langle a, c \rangle & \langle a, d \rangle \\    \langle b, a \rangle & \langle b, b\rangle & \langle b, c \rangle & \langle b, d \rangle \\    \langle c, a \rangle & \langle c, b\rangle & \langle c, c \rangle & \langle c, d \rangle \\    \langle d, a \rangle & \langle d, b\rangle & \langle d, c \rangle & \langle d, d \rangle \end{array} \right)

That’s all fine and dandy and well and reasonable.  Here’s the exercise: if I give you a matrix A, can you tell me if you can make A out of some unit vectors in this way with some inner product?  

Notice that in the question, we didn’t say a particular inner product (we’re using one example, remember?)  This just says if you have a matrix, can you find an inner product and some unit vectors that make it.  So to take a few stabs at this problem, we’ll have to use all those axioms of inner products?  Definitely we need A to be symmetric (that means that the entry in the 2nd row, 3rd column is equal to the entry in the 3rd row, 2nd column etc. etc.) by property 1.

Property 2 and our discussion above say that all the diagonal entries have to be 1 (because we started with unit vectors).

Those first two conditions are necessary for A to even be a candidate, but just because A satisfies those conditions doesn’t mean that there’s a collection of unit vectors that form it.  (These are “necessary” but not “sufficient” conditions.)

The actual answer is that A must be non-negative definite: this means that if A is rows by columns and you pick a collection of complex numbers, this sum \sum_{i,j} a_{ij}c_i\bar{c_j} is greater than or equal to zero (it’s not negative).  Parsing the sum expression: take each entry of A and multiply it by the complex number corresponding to its row, and by the conjugate of the complex number corresponding to its column.  Add up all of these products.  If this number is non-negative, then yes, your matrix can arise from a collection of unit vectors.



Just kidding!  Did you think I’d just throw an answer at you and not explain it?  That’s not how we do!


As always, the great question in math is why? Just getting a random condition thrown at you shouldn’t feel satisfying (well, it might, but too bad I’m taking away your satisfaction).

Let’s find out why the condition is necessary:

Squaring any number makes a positive number, so if you add up all your unit vectors and take the norm of that and then square it, you should get a positive number.  Even if you throw random complex scalar multiplication on to those vectors, the norm squared will still be positive.  So in equations, we have 0 \leq \| \sum_{i=1}^n c_iv_i \|^2.  If we unpack this definition, we get

0 \leq \| \sum_{i=1}^n c_iv_i \|^2

=\langle \sum c_iv_i, \sum c_jv_j \rangle

=\sum \langle v_i,v_j \rangle c_i \bar{c_j}

= \sum a_{ij} c_i \bar{c_j}

So we get the condition is necessary (didn’t even use the unit-ness of the vectors here).

How about sufficiency?  Well, we have to come up with an inner product and some unit vectors.  So take some unit vectors u_i.  Any vector will be written \sum t_i u_i.  We’ll define an inner product on this space by \langle \sum t_iu_i, \sum t_ju_j \rangle_A = \sum a_{ij} t_i \bar{t_j}.  Go check that this inner product satisfies the three axioms above.  Great!  It’s an inner product (this relies on the fact that A is non-negative definite).  If you let all the t_i=1, then you can form A from the inner product of the u_is.  That’s it!


Congratulations!  That was a hard exercise session!  We did it!

%d bloggers like this: