I took linear algebra as an undergrad and then took a slightly fancier version in my first year of grad school, and I understood all the “matrices <==> linear transformations” stuff, but I never really felt comfortable interpreting the actual entries of a matrix until my second year of grad school, when I learned the rule
the matrix-vector product A*v is a linear combination of the columns of A, with the coefficients given by the entries of v
I learned this from the excellent book Numerical Linear Algebra by Trefethen and Bau, and I don’t think I’ve ever heard it mentioned by anyone else outside of that book. Yet it’s been invaluable to me, and not just for numerics. Did I just miss out, or is this simple fact not disseminated widely enough?
The difficult thing about teaching linear algebra (he says, procrastinating from writing the last week of notes for the linear algebra class he is teaching) is that the entire subject is, like, four actual facts, each of which is repeated twenty times in slightly different language.
And you have a great example! We could talk about:
A linear transformation, as a function with certain properties.
Matrix mutiplication
A system of linear equations
A collection of dot products with the row vectors
A linear combination of column vectors
A hyperplane in some higher-dimensional space
A semi-rigid geometric transformation of some space.
A function determined entirely by what it does to some basis.
And those are all the same thing. I think typically students coming out of a (first) linear algebra class understand and have internalized a couple of those; can cite a couple others; and are completely oblivious to the rest. (Any may not have heard of some, because it’s hard to cover all eight; I know that my discussion of the geometric properties has been somewhat perfunctory.
But for any given person, some of these perspectives will make much more sense than others; and if your class doesn’t get you to the ones that work for you, you won’t understand nearly as much as if it does.
(The goal, of course, is to understand all of the perspectives, and to switch among them fluently, but that’s hard and definitely not happening in a first course. So you have to pick your focuses. The reason I was so unhappy with my college’s choice of textbook is that its focus is exacty the opposite of what I would like).
So, for instance, you say that the matrix product is a linear combination of the columns of the matrix, with coefficients given by the input vector. And you say that, and I think for a few seconds and say “huh, I guess that’s true.” But that’s not how I think about it; I think about it as a function that sends each standard basis element to the corresponding column vector.
Except those are literally the exact same thing. You write your input as a linear combination of your standard basis vectors, and then your function preserves linear combinations, and sends each basis vector to the corresponding column—so you get a linear combination of the column vectors.
And I think the thing I just said is pretty common to mention. It’s certainly necessary for doing any sort of change-of-basis stuff. But if it made more sense to you in different language, that’s 100% unsurprising.
If you’re interested, here’s a stab at describing why I find the columns thing so useful.
In a lot of physics-like contexts, it’s natural to write vectors with respect to a basis which has a special physical importance, but whose basis vectors don’t. For instance, the position x(t) of a damped harmonic oscillator obeys
x’(t) = v
v’(t) = -(cv + kx)
This can be written as a matrix equation y’(t) = Ay, with y = (x, v)^T.
There is clearly something uniquely nice about the basis being used here. x is position and v is velocity, and it’s a easier to interpret a solution written in terms of x and v than one in terms of (say) x + 2v and x - v. In fact, since x and v are what you can actually measure, you have to specify how to transform to this particular basis, or you lose the physical meaning.
On the other hand, the basis vectors have little physical importance. One basis vector is a state with zero velocity, and the other is a state with zero position, and there isn’t any interesting physical connection between the state (x, v) and the states (x, 0), (0, v). So there’s no physical intuition you can attach to the question “how does A act on (x, 0)?”
In this type of problem, there is a different basis whose basis vectors have special physical meaning: the eigenbasis of A (each eigenspace is closed under time evolution and grows/decays/oscillates at rates given by the eigenvalue). But you wouldn’t typically write down a problem in that basis at the outset, because you want to give the reader the directly measurable quantities first.
Now that I think about it, the above makes (x, v)^T feel like a covector: we naturally think about its coefficients (“how it acts on the basis elements”), not its decomposition with respect to some basis. That suggests that this might all be less confusing if we wrote everything with row vectors instead of column vectors. Vectors would multiply matrices on the left rather than the right, and we would naturally read this off as the vector transforming the matrix rather than vice versa. But for whatever reason, column vectors are standard.
In fact, Trefethen and Bau’s comments on the columns thing can be viewed as trying to correct for the psychological effect of using column vectors instead of row vectors:

Will think more about this in the morning, after I’ve gotten some sleep. But my first reaction is to think that in some sense things are backwards. You have the matrix equation y’ = Ay, and you want to find y. So really your equation is A^(-1) y’ = y.
Basically, I think your second chunk is right; the reason this is feeling unnatural to you is that you’re never using the matrix to plug in y and get y’. Instead, you’re saying that the matrix gives you a (parametrized family of) functions, so you’re saying “I want to know position and velocity, and if I have this matrix I get that family of functions.”
You can always perform this sort of sleight of hand, of course. Any time you have a family of functions f_k: A -> B, you could instead think of this as a family of functions A_a: F -> B. Or as a single function from (F x A) -> B.
But if you find yourself doing this thing a lot, I can see why you’d want to think of linear algebra in a way I find slightly odd. (And I still don’t think I’ve totally wrapped my head around the way you’re thinking of it, so I may come back and revisit this thought later on).
Hm, I’ve thought about this a bit more and I think I figured out why this is weird for me, but I haven’t quite understood it yet. But basically, I don’t think I would think of “position and velocity” as a “basis” at all. x and v aren’t numbers; they’re functions.
You’re working on an infinite-dimensional function space squared; a basis for the whole space will have way more than two elements. And you’re right that the partition into “the position and the velocity” is more natural to the problem than the division into, like, “the position plus the velocity and the position minus the velocity”. But that doesn’t have anything to do with them being a “basis”, which they’re not.
Unless I guess our field of scalars is a function field or something? But that setup is weird enough that I don’t trust myself to understand it on this little sleep either.
