Span Dimension
[M] Given nnn vectors, each of dimension ddd, what is the dimension of their span?
The dimension of the span of nnn vectors (each of dimension ddd) is the number of linearly independent vectors among them. This can be determined by:
- Forming a matrix where each vector is a row (or column).
- The rank of this matrix (the number of linearly independent rows or columns) gives the dimension of the span.
The dimension of the span is at most min(n,d)\min(n, d)min(n,d). If the vectors are linearly independent, the span has dimension nnn; if some vectors are linearly dependent, the span's dimension is less than nnn.
Norms and Metrics
[E] What is a norm? What is ∥u∥2\|\mathbf{u}\|_2∥u∥2?
A norm is a function that assigns a non-negative length or size to a vector in a vector space. Formally, a norm ∥⋅∥\|\cdot\|∥⋅∥ on a vector space VVV satisfies the following properties for all vectors u,v∈V\mathbf{u}, \mathbf{v} \in Vu,v∈V and all scalars ccc:
- Non-negativity: ∥u∥≥0\|\mathbf{u}\| \geq 0∥u∥≥0 and ∥u∥=0\|\mathbf{u}\| = 0∥u∥=0 if and only if u=0\mathbf{u} = \mathbf{0}u=0.
- Scalar multiplication: ∥cu∥=∣c∣∥u∥\|c\mathbf{u}\| = |c|\|\mathbf{u}\|∥cu∥=∣c∣∥u∥.
- Triangle inequality: ∥u+v∥≤∥u∥+∥v∥\|\mathbf{u} + \mathbf{v}\| \leq \|\mathbf{u}\| + \|\mathbf{v}\|∥u+v∥≤∥u∥+∥v∥.
The 222-norm, denoted ∥u∥2\|\mathbf{u}\|_2∥u∥2, is the Euclidean norm, and it is defined as:
∥u∥2=u12+u22+⋯+ud2\|\mathbf{u}\|_2 = \sqrt{u_1^2 + u_2^2 + \dots + u_d^2}∥u∥2=u12+u22+⋯+ud2
where u=(u1,u2,…,ud)\mathbf{u} = (u_1, u_2, \dots, u_d)u=(u1,u2,…,ud) is a vector in ddd-dimensional space. It represents the Euclidean length of the vector.
[M] How do norm and metric differ?
- A norm measures the length or size of a vector.
- A metric (or distance function) measures the distance between two vectors.
A norm ∥⋅∥\|\cdot\|∥⋅∥ induces a metric d(u,v)d(\mathbf{u}, \mathbf{v})d(u,v) by defining the distance between two vectors u\mathbf{u}u and v\mathbf{v}v as:
d(u,v)=∥u−v∥d(\mathbf{u}, \mathbf{v}) = \|\mathbf{u} - \mathbf{v}\|d(u,v)=∥u−v∥
So, every norm can be used to create a metric. However, not every metric arises from a norm.
[M] Can we create a metric from a norm?
Yes, given any norm ∥⋅∥\|\cdot\|∥⋅∥, we can define a metric using:
d(u,v)=∥u−v∥d(\mathbf{u}, \mathbf{v}) = \|\mathbf{u} - \mathbf{v}\|d(u,v)=∥u−v∥
This defines a distance function (metric) between the vectors u\mathbf{u}u and v\mathbf{v}v that satisfies the properties of a metric: non-negativity, identity of indiscernibles, symmetry, and triangle inequality.
[M] Can we create a norm from a metric?
Not always. A metric does not necessarily induce a norm. For a metric to come from a norm, it needs to satisfy additional properties, such as homogeneity (scaling property) and being defined over a vector space. Many metrics (e.g., the Manhattan distance or the Mahalanobis distance) correspond to norms, but some metrics (such as the discrete metric, where d(u,v)=1d(\mathbf{u}, \mathbf{v}) = 1d(u,v)=1 if u≠v\mathbf{u} \neq \mathbf{v}u=v) do not arise from a norm.
Eigenvalues, Trace, and Determinant
[M] A matrix has four eigenvalues. What can we say about the trace and the determinant of this matrix?
- The trace of a matrix is the sum of its eigenvalues. So, if a matrix has four eigenvalues λ1,λ2,λ3,λ4\lambda_1, \lambda_2, \lambda_3, \lambda_4λ1,λ2,λ3,λ4, then the trace of the matrix is:
Trace(A)=λ1+λ2+λ3+λ4\text{Trace}(A) = \lambda_1 + \lambda_2 + \lambda_3 + \lambda_4Trace(A)=λ1+λ2+λ3+λ4
- The determinant of a matrix is the product of its eigenvalues. So, if the matrix has four eigenvalues, the determinant is:
det(A)=λ1⋅λ2⋅λ3⋅λ4\text{det}(A) = \lambda_1 \cdot \lambda_2 \cdot \lambda_3 \cdot \lambda_4det(A)=λ1⋅λ2⋅λ3⋅λ4
Covariance Matrix vs Gram Matrix
[M] What’s the difference between the covariance matrix and the Gram matrix?
- A covariance matrix is a square matrix that captures the covariance (a measure of how much two random variables change together) between pairs of elements in a random vector. Given a set of nnn random variables x1,x2,…,xn\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_nx1,x2,…,xn, the covariance matrix is defined as:
Cov(X)=1n∑i=1n(xi−μ)(xi−μ)T\text{Cov}(X) = \frac{1}{n} \sum_{i=1}^n (\mathbf{x}_i - \mu)(\mathbf{x}_i - \mu)^TCov(X)=n1i=1∑n(xi−μ)(xi−μ)T
where μ\muμ is the mean vector of the dataset. The covariance matrix is widely used in statistics and machine learning for analyzing the relationships between variables.
- A Gram matrix is a matrix of inner products of a set of vectors. Given vectors v1,v2,…,vn\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_nv1,v2,…,vn, the Gram matrix GGG is:
G=VTVG = \mathbf{V}^T \mathbf{V}G=VTV
where V\mathbf{V}V is the matrix whose columns are the vectors vi\mathbf{v}_ivi. The Gram matrix reflects the geometrical relationships (such as angles and lengths) between the vectors.
While both matrices represent relationships between data points, the covariance matrix reflects statistical dependencies between variables, while the Gram matrix reflects geometric relationships between vectors, specifically inner products.
t-SNE (t-distributed Stochastic Neighbor Embedding)
[H] How does t-SNE (T-distributed Stochastic Neighbor Embedding) work? Why do we need it?
t-SNE is a dimensionality reduction technique that is primarily used for visualizing high-dimensional data in 2D or 3D. It works by:
- Modeling pairwise similarities: In the high-dimensional space, t-SNE models the probability distribution of pairwise similarities between points based on a Gaussian distribution.
- Mapping to low dimensions: It then maps the points to a lower-dimensional space (usually 2D or 3D) by minimizing the difference (KL divergence) between the pairwise similarities in the high-dimensional and low-dimensional spaces. In the low-dimensional space, it uses a t-distribution to better capture local clusters and avoid crowding of points.