Mesh (trapeziod_mesh.sv)

Source file: rtl/systolic_gauss_jordan/trapeziod_mesh.sv

Trapezoidal mesh figure (, , )

Parameters

Systolic mesh for returning the solution \(A^{-1}B\) for a system of equations \(A X = B\) consisting of parameters

  • N: number of rows of A and B

  • M: number of columns of A

  • L: number of columns of B

Ports

  • clk: clock

  • rst: synchronous reset

  • en_i: global hold/advance enable

  • reduce_i: reduce signal into pe_diag for reduce pass

  • data_top_i[(N+L)-1:0]: top-edge data input vector for all global columns

  • data_bottom_o[L-1:0]: bottom-boundary readout

  • diag_data_out_o[N-1:0]: debug export of each diagonal cell’s downward data output

  • diag_reduce_in_o[N-1:0]: debug export of the reduce bit seen by each diagonal cell

  • a_regs_flat_o[(N*N)-1:0]: flattened export of all stored bits in the triangular A region

  • b_regs_flat_o[(N*L)-1:0]: flattened export of all stored bits in the lifted B rectangle

Structural organization

Coordinates with col < row are outside the trapezoid and are tied off to known simulation values:

  • data_down_bus[row][col] = 0

  • op_bus[row][col] = OP_PASS

  • a_regs_flat_o[(row * N) + col] = 0 when col < N

At diagonal cells, the mesh instantiates pe_diag.sv.

Each diagonal cell:

  • receives vertical data from data_top_i[col] when row == 0 else from from data_down_bus[row-1][col]

  • receives its row’s reduce token from reduce_in_bus[row]

  • drives the first horizontal opcode token for that row into op_bus[row][col]

  • exports its local stored bit into a_regs_flat_o[(row * N) + col]

The diagonal downward data output is also exported through diag_data_out_o[row].

At off-diagonal cells the mesh instantiates pe_col.sv.

Storage export depends on the global column:

  • if col < N, the cell belongs to the triangular A region and exports into a_regs_flat_o[(row * N) + col]

  • if col >= N, the cell belongs to the lifted B rectangle and exports into b_regs_flat_o[(row * L) + (col - N)]

Internal buses

Vertical data bus

data_down_bus[row][col] is the downward data emitted by cell (row, col). It is consumed by the cell at (row + 1, col).

This is the only vertical data path in the structural mesh:

  • top-edge cells read from data_top_i

  • all lower active cells read from data_down_bus[row-1][col]

Horizontal opcode bus

op_bus[row][col] is the opcode emitted by cell (row, col) and consumed by the cell immediately to the right at (row, col + 1).

This means:

  • each diagonal cell seeds the row’s opcode flow

  • each pe_col forwards the opcode one hop to the right

Reduce path

The reduce path is separate from the horizontal opcode bus and flows diagonally.

  • the reduce pipeline has a delay for each diagonal hop REDUCE_PIPE_STAGES= max(1, REDUCE_HOP_DELAY)

For row k > 0:

  • reduce_out_bus[k-1] enters the pipeline for row k

  • after REDUCE_PIPE_STAGES enabled clock cycles, it appears on reduce_in_bus[k]

This is implemented with:

  • reduce_in_bus

  • reduce_out_bus

  • reduce_pipe_q[1:MESH_ROWS-1][0:REDUCE_PIPE_STAGES-1]

Export packing

a_regs_flat_o

This export always has width N * N, even though only the upper-triangular part corresponds to active cells.

Packing rule:

a_regs_flat_o[(row * N) + col]
  • active A-region cells write their stored bit there

  • inactive coordinates below the diagonal are driven to 0

b_regs_flat_o

This export packs only the lifted rectangle on the right:

b_regs_flat_o[(row * L) + (col - N)]

for col >= N.

So each mesh row contributes exactly L bits to b_regs_flat_o.

Boundary export

The RTL makes only the lifted rectangle on the far right architecturally visible at the bottom boundary:

data_bottom_o[lift_col] = data_down_bus[N-1][N + lift_col]

for lift_col = 0 .. L-1.

Debugging notes

The cocotb tests intentionally rely on stable generated hierarchy names. The RTL comments call out these scopes and signals as part of the debugging contract:

  • g_row

  • g_col

  • g_diag

  • g_apply

  • u_pe_diag

  • u_pe_col

  • local signal name data_in

The explicit debug outputs diag_data_out_o and diag_reduce_in_o are also part of that observability story.

Test

make -C test/systolic_gauss_jordan TEST=trapeziod_mesh
make -C test/systolic_gauss_jordan TEST=trapeziod_full_trace_reduce

Back to Systolic Gauss-Jordan.