Senate voting analysis and visualization.
In this case study, we take data from the votes on bills in the US Senate (2004-2006), shown as a table above, and explore how we can visualize the data by projecting it, first on a line then on a plane. We investigate how we can choose the line or plane in a way that maximizes the variance in the result, via a principal component analysis method. Finally, we examine how a variation on PCA that encourages sparsity of the projection directions allows to understand which bills are most responsible for the variance in the data.
- Senate voting data and the visualization problem
- Projection on a line
- Projection on a plane
- Maximum-variance projections
- PCA
- Sparse PCA
Senate voting data and the visualization problem.
Data
The data consists of the votes of Senators in the 2004-2006 US Senate (2004-2006), for a total of bills. “Yay” (“Yes”) votes are represented as ‘s, “Nay” (“No”) as ‘s, and the other votes are recorded as . (A number of complexities are ignored here, such as the possibility of pairing the votes.)
This data can be represented here as a ‘‘voting’’ matrix , with elements taken from . Each column of the voting matrix contains the votes of a single Senator for all the bills; each row contains the votes of all Senators on a particular bill.
Senate voting matrix: “Nay” votes are in black, “Yay” ones in white, and the others in grey. The transpose voting matrix is shown. The picture becomes has many gray areas, as some Senators are replaced over time. Simply plotting the raw data matrix is often not very informative.
Visualization Problem
We can try to visualize the data set, by projecting each data point (each row or column of the matrix) on (say) a 1D-, 2D- or 3D-space. Each ‘‘view’’ corresponds to a particular projection, that is, a particular one-, two- or three-dimensional subspace on which we choose to project the data. The visualization problem consists of choosing an appropriate projection.
There are many ways to formulate the visualization problem, and none dominates the others. Here, we focus on the basics of that problem.
Projection on a line and a plane
To simplify, let us first consider the simple problem of representing the high-dimensional data set on a simple line, using the method described here.