Mesh (trapeziod_mesh.sv)
Source file: rtl/systolic_gauss_jordan/trapeziod_mesh.sv
Parameters
Systolic mesh for returning the solution \(A^{-1}B\) for a system of equations \(A X = B\) consisting of parameters
N: number of rows ofAandBM: number of columns ofAL: number of columns ofB
Ports
clk: clockrst: synchronous reseten_i: global hold/advance enablereduce_i: reduce signal intope_diagfor reduce passdata_top_i[(N+L)-1:0]: top-edge data input vector for all global columnsdata_bottom_o[L-1:0]: bottom-boundary readoutdiag_data_out_o[N-1:0]: debug export of each diagonal cell’s downward data outputdiag_reduce_in_o[N-1:0]: debug export of the reduce bit seen by each diagonal cella_regs_flat_o[(N*N)-1:0]: flattened export of all stored bits in the triangularAregionb_regs_flat_o[(N*L)-1:0]: flattened export of all stored bits in the liftedBrectangle
Structural organization
Coordinates with col < row are outside the trapezoid and are tied off to known
simulation values:
data_down_bus[row][col] = 0op_bus[row][col] = OP_PASSa_regs_flat_o[(row * N) + col] = 0whencol < N
At diagonal cells, the mesh instantiates pe_diag.sv.
Each diagonal cell:
receives vertical data from
data_top_i[col]whenrow == 0else from fromdata_down_bus[row-1][col]receives its row’s reduce token from
reduce_in_bus[row]drives the first horizontal opcode token for that row into
op_bus[row][col]exports its local stored bit into
a_regs_flat_o[(row * N) + col]
The diagonal downward data output is also exported through diag_data_out_o[row].
At off-diagonal cells the mesh instantiates pe_col.sv.
Storage export depends on the global column:
if
col < N, the cell belongs to the triangularAregion and exports intoa_regs_flat_o[(row * N) + col]if
col >= N, the cell belongs to the liftedBrectangle and exports intob_regs_flat_o[(row * L) + (col - N)]
Internal buses
Vertical data bus
data_down_bus[row][col] is the downward data emitted by cell (row, col).
It is consumed by the cell at (row + 1, col).
This is the only vertical data path in the structural mesh:
top-edge cells read from
data_top_iall lower active cells read from
data_down_bus[row-1][col]
Horizontal opcode bus
op_bus[row][col] is the opcode emitted by cell (row, col) and consumed by
the cell immediately to the right at (row, col + 1).
This means:
each diagonal cell seeds the row’s opcode flow
each
pe_colforwards the opcode one hop to the right
Reduce path
The reduce path is separate from the horizontal opcode bus and flows diagonally.
the reduce pipeline has a delay for each diagonal hop
REDUCE_PIPE_STAGES= max(1, REDUCE_HOP_DELAY)
For row k > 0:
reduce_out_bus[k-1]enters the pipeline for rowkafter
REDUCE_PIPE_STAGESenabled clock cycles, it appears onreduce_in_bus[k]
This is implemented with:
reduce_in_busreduce_out_busreduce_pipe_q[1:MESH_ROWS-1][0:REDUCE_PIPE_STAGES-1]
Export packing
a_regs_flat_o
This export always has width N * N, even though only the upper-triangular
part corresponds to active cells.
Packing rule:
a_regs_flat_o[(row * N) + col]
active
A-region cells write their stored bit thereinactive coordinates below the diagonal are driven to
0
b_regs_flat_o
This export packs only the lifted rectangle on the right:
b_regs_flat_o[(row * L) + (col - N)]
for col >= N.
So each mesh row contributes exactly L bits to b_regs_flat_o.
Boundary export
The RTL makes only the lifted rectangle on the far right architecturally visible at the bottom boundary:
data_bottom_o[lift_col] = data_down_bus[N-1][N + lift_col]
for lift_col = 0 .. L-1.
Debugging notes
The cocotb tests intentionally rely on stable generated hierarchy names. The RTL comments call out these scopes and signals as part of the debugging contract:
g_rowg_colg_diagg_applyu_pe_diagu_pe_collocal signal name
data_in
The explicit debug outputs diag_data_out_o and diag_reduce_in_o are also
part of that observability story.
Test
make -C test/systolic_gauss_jordan TEST=trapeziod_mesh
Test file:
test_trapeziod_full_trace_reduce.pytraces the register behavior during the reduce phase.
make -C test/systolic_gauss_jordan TEST=trapeziod_full_trace_reduce
Back to Systolic Gauss-Jordan.