Core Assembly in Finite Element Analysis#

Introduction#

Finite element assembly is the step that turns element-level quantities into the global linear system solved by femlabpy. This chapter provides the rigorous mathematical foundation for the assembly process, deriving it from variational principles.

1. Variational Foundation#

1.1 Principle of minimum potential energy#

The finite element method is rooted in the principle of minimum potential energy. For an elastic body, the total potential energy functional is:

\[ \Pi(\mathbf{u}) = U(\mathbf{u}) - W(\mathbf{u}) \]

where:

\(U(\mathbf{u}) = \frac{1}{2}\int_\Omega \boldsymbol{\varepsilon}^T \mathbf{D} \boldsymbol{\varepsilon} \, dV\) is the strain energy
\(W(\mathbf{u}) = \int_\Omega \mathbf{u}^T \mathbf{b} \, dV + \int_{\Gamma_t} \mathbf{u}^T \mathbf{t} \, dS\) is the work of external forces

The equilibrium configuration minimizes \(\Pi\):

\[ \delta \Pi = 0 \quad \forall \, \delta\mathbf{u} \text{ admissible} \]

1.2 Weak form derivation#

Taking the first variation of the potential energy:

\[ \delta\Pi = \int_\Omega \delta\boldsymbol{\varepsilon}^T \boldsymbol{\sigma} \, dV - \int_\Omega \delta\mathbf{u}^T \mathbf{b} \, dV - \int_{\Gamma_t} \delta\mathbf{u}^T \mathbf{t} \, dS = 0 \]

This is the weak form or principle of virtual work:

\[ \int_\Omega \delta\boldsymbol{\varepsilon}^T \boldsymbol{\sigma} \, dV = \int_\Omega \delta\mathbf{u}^T \mathbf{b} \, dV + \int_{\Gamma_t} \delta\mathbf{u}^T \mathbf{t} \, dS \]

Proof of equivalence to strong form:

Using the relation \(\delta\boldsymbol{\varepsilon} = \mathbf{L}\delta\mathbf{u}\) where \(\mathbf{L}\) is the differential operator, and integrating by parts:

\[ \int_\Omega \delta\mathbf{u}^T \mathbf{L}^T \boldsymbol{\sigma} \, dV = -\int_\Omega \delta\mathbf{u}^T (\nabla \cdot \boldsymbol{\sigma}) \, dV + \int_{\Gamma} \delta\mathbf{u}^T (\boldsymbol{\sigma} \cdot \mathbf{n}) \, dS \]

Since \(\delta\mathbf{u}\) is arbitrary, this implies:

\(\nabla \cdot \boldsymbol{\sigma} + \mathbf{b} = \mathbf{0}\) in \(\Omega\) (equilibrium)
\(\boldsymbol{\sigma} \cdot \mathbf{n} = \mathbf{t}\) on \(\Gamma_t\) (traction BC)

1.3 Galerkin discretization#

We approximate the displacement field using shape functions:

\[ \mathbf{u}(\mathbf{x}) \approx \mathbf{u}^h(\mathbf{x}) = \sum_{i=1}^{n} N_i(\mathbf{x}) \mathbf{d}_i = \mathbf{N}(\mathbf{x}) \mathbf{d} \]

The Galerkin method requires:

\[ \delta\mathbf{u}^h \in \text{span}\{N_1, N_2, \ldots, N_n\} \]

Substituting into the weak form and using \(\delta\boldsymbol{\varepsilon} = \mathbf{B}\delta\mathbf{d}\):

\[ \delta\mathbf{d}^T \left[ \int_\Omega \mathbf{B}^T \mathbf{D} \mathbf{B} \, dV \right] \mathbf{d} = \delta\mathbf{d}^T \left[ \int_\Omega \mathbf{N}^T \mathbf{b} \, dV + \int_{\Gamma_t} \mathbf{N}^T \mathbf{t} \, dS \right] \]

Since this must hold for all \(\delta\mathbf{d}\):

\[ \mathbf{K}\mathbf{d} = \mathbf{F} \]

where \(\mathbf{K} = \int_\Omega \mathbf{B}^T \mathbf{D} \mathbf{B} \, dV\) and \(\mathbf{F}\) is the load vector.

2. Assembly Identity and Derivation#

The global stiffness matrix \(\mathbf{K}\) and global internal force vector \(\mathbf{q}\) are constructed by summing the contributions from all \(N_e\) elements in the mesh. Let the global displacement vector be \(\mathbf{U}\) (size \(N \times 1\)) and the local displacement vector for element \(e\) be \(\mathbf{u}_e\) (size \(n_e \times 1\)).

Boolean connectivity matrix#

The relationship between the global degrees of freedom and the local degrees of freedom for an element \(e\) can be written with a Boolean connectivity matrix \(\mathbf{L}_e\). Its role is to pick the global entries that belong to the element.

The gather operation for local displacements \(\mathbf{u}_e\) is:

\[ \mathbf{u}_e = \mathbf{L}_e \mathbf{U} \]

where the entries of \(\mathbf{L}_e\) are defined as:

\[\begin{split} (\mathbf{L}_e)_{ij} = \begin{cases} 1 & \text{if local DOF } i \text{ corresponds to global DOF } j \\ 0 & \text{otherwise} \end{cases} \end{split}\]

Because each local DOF maps to exactly one global DOF, each row of \(\mathbf{L}_e\) contains exactly one 1 and all other entries are 0.

The transpose \(\mathbf{L}_e^T\) performs the reverse operation. It scatters element contributions back into the global array. That is the algebraic origin of the assembly operation used in software.

Strain energy and the assembly operator#

To derive the global assembly equation for the stiffness matrix, consider the total strain energy of the system, \(U_{\text{total}}\), which is the sum of the strain energies of the individual elements:

\[ U_{\text{total}} = \sum_{e=1}^{N_e} U_e = \sum_{e=1}^{N_e} \frac{1}{2} \mathbf{u}_e^T \mathbf{K}_e \mathbf{u}_e \]

Substitute the Boolean mapping \(\mathbf{u}_e = \mathbf{L}_e \mathbf{U}\) into the energy equation:

\[ U_{\text{total}} = \sum_{e=1}^{N_e} \frac{1}{2} (\mathbf{L}_e \mathbf{U})^T \mathbf{K}_e (\mathbf{L}_e \mathbf{U}) = \frac{1}{2} \mathbf{U}^T \left( \sum_{e=1}^{N_e} \mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e \right) \mathbf{U} \]

The total strain energy can also be expressed directly in terms of the global stiffness matrix \(\mathbf{K}\):

\[ U_{\text{total}} = \frac{1}{2} \mathbf{U}^T \mathbf{K} \mathbf{U} \]

Comparing the two expressions yields the fundamental equation for global stiffness assembly:

\[ \mathbf{K} = \sum_{e=1}^{N_e} \mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e \]

Concrete example#

Let us ground this with a highly specific example. Consider a 1D bar partitioned into 2 elements (3 nodes total: nodes 1, 2, 3), where each node has 1 DOF. The global system has \(N=3\) DOFs. Element 1 connects nodes 1 and 2. Element 2 connects nodes 2 and 3.

The Boolean matrix for Element 1, \(\mathbf{L}_1\), maps the \(2\) local DOFs of the element to the \(3\) global DOFs:

\[\begin{split} \mathbf{L}_1 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} \end{split}\]

The transpose \(\mathbf{L}_1^T\) (the “scatter” matrix) is:

\[\begin{split} \mathbf{L}_1^T = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \end{split}\]

Let the element stiffness matrix be:

\[\begin{split} \mathbf{K}_1 = \begin{bmatrix} k_{11} & k_{12} \\ k_{21} & k_{22} \end{bmatrix} \end{split}\]

Now, evaluate the matrix product \(\mathbf{L}_1^T \mathbf{K}_1 \mathbf{L}_1\):

First multiplication \(\mathbf{K}_1 \mathbf{L}_1\):

\[\begin{split} \mathbf{K}_1 \mathbf{L}_1 = \begin{bmatrix} k_{11} & k_{12} \\ k_{21} & k_{22} \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} = \begin{bmatrix} k_{11} & k_{12} & 0 \\ k_{21} & k_{22} & 0 \end{bmatrix} \end{split}\]

Second multiplication \(\mathbf{L}_1^T (\mathbf{K}_1 \mathbf{L}_1)\):

\[\begin{split} \mathbf{L}_1^T (\mathbf{K}_1 \mathbf{L}_1) = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} k_{11} & k_{12} & 0 \\ k_{21} & k_{22} & 0 \end{bmatrix} = \begin{bmatrix} k_{11} & k_{12} & 0 \\ k_{21} & k_{22} & 0 \\ 0 & 0 & 0 \end{bmatrix} \end{split}\]

The operation \(\mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e\) mathematically expands a small \(n_e \times n_e\) matrix into an \(N \times N\) matrix, padding it with zeros everywhere except at the specific rows and columns corresponding to the element’s global DOFs. The sum \(\sum \mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e\) then superimposes all these padded matrices.

Python implementation with `numpy.ix_`#

While the equations above are mathematically elegant, constructing large, mostly empty \(\mathbf{L}_e\) matrices and performing the full matrix multiplications is computationally intractable (\(\mathcal{O}(N^3)\) operations for mostly zeros). In practice, the operation \(\mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e\) is realized as an array slicing operation: “scatter the entries of \(\mathbf{K}_e\) into the appropriate rows and columns of \(\mathbf{K}\)”.

In Python, this is executed using numpy.ix_.

Why `numpy.ix_` is the right tool#

Advanced indexing in NumPy allows you to select arbitrary items based on their N-dimensional index. If you have an element whose local DOFs map to global DOFs [0, 2], you want to add the \(2 \times 2\) matrix \(\mathbf{K}_e\) to rows [0, 2] and columns [0, 2] of \(\mathbf{K}\).

If you simply write K[[0, 2], [0, 2]], NumPy performs integer array indexing, pairing the indices up: it selects K[0, 0] and K[2, 2], resulting in a 1D array of two elements. This is not a submatrix (block) selection!

To select the \(2 \times 2\) block formed by the intersection of rows [0, 2] and columns [0, 2], you need an open mesh grid. numpy.ix_ takes 1D sequences and returns a tuple of N-dimensional arrays that broadcast to form the full block.

import numpy as np

indices = [0, 2]
ix_grid = np.ix_(indices, indices)

print(ix_grid)
# Output:
# (array([[0],
#         [2]]), 
#  array([[0, 2]]))

Notice the shapes:

The row index array has shape (2, 1): [[0], [2]]
The column index array has shape (1, 2): [[0, 2]]

When NumPy uses these arrays to index K, the shapes broadcast together to form a (2, 2) grid:

Row 0 with Col 0 \(\rightarrow (0, 0)\)
Row 0 with Col 2 \(\rightarrow (0, 2)\)
Row 2 with Col 0 \(\rightarrow (2, 0)\)
Row 2 with Col 2 \(\rightarrow (2, 2)\)

This precisely targets the block entries corresponding to the selected global DOFs and matches the mathematical result of \(\mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e\).

Sparse and dense assembly in `assmk`#

When accumulating \(\mathbf{K}_e\) into \(\mathbf{K}\), the syntax differs slightly based on whether \(\mathbf{K}\) is a dense numpy.ndarray or a scipy.sparse.lil_matrix.

Dense arrays#

A dense matrix stores every coefficient in continuous memory blocks. In Python, an in-place addition to a block using advanced indexing works natively:

K[np.ix_(indices, indices)] += Ke

This invokes the C-level __iadd__ routine. Because the memory is contiguous and fully allocated, NumPy computes the exact memory offsets and rapidly adds the values in-place without creating intermediate large array copies. It is blazingly fast for small meshes (e.g., \(N < 1000\)).

SciPy sparse matrices#

Finite element matrices are inherently sparse. For large systems (\(N \gg 1000\)), storing a dense matrix consumes \(\mathcal{O}(N^2)\) memory, which quickly exceeds available RAM. Sparse matrices store only non-zero entries, requiring \(\mathcal{O}(N)\) memory.

During assembly, SciPy’s lil_matrix (List of Lists) is typically used. In this format, each row is a Python list of column indices and a corresponding list of values.

For sparse arrays, augmented assignment with advanced indexing is problematic. The expression:

K[np.ix_(indices, indices)] += Ke

fails or is highly inefficient for lil_matrix because += attempts to mutate the extracted sparse slice in-place, which isn’t robustly supported by SciPy’s sparse advanced indexing mechanics.

Instead, we must write:

K[np.ix_(indices, indices)] = K[np.ix_(indices, indices)] + Ke

What happens here step-by-step?

Extract: K[np.ix_(indices, indices)] extracts the sparse block, instantiating it temporarily as a dense or sparse sub-matrix.
Add: + Ke adds the dense element matrix to the extracted block.
Assign: The = operator calls __setitem__ on the lil_matrix. The LIL matrix updates its internal lists. If a zero entry suddenly becomes non-zero, it dynamically appends the new column index and value to the corresponding row list.

While allocating lists during assignment has some overhead, it prevents the quadratic memory blow-up of a dense matrix and keeps the workflow practical for large meshes. After assembly is complete, the lil_matrix is typically converted to Compressed Sparse Row (csr_matrix) format for fast solving.

Complete example: 3-element manual assembly#

Below is a self-contained, 50-line runnable script demonstrating everything discussed: Boolean matrices via ix_, dense vs. sparse logic, and a 3-element system (4 nodes, 1 DOF per node).

import numpy as np
import scipy.sparse as sp

def assemble_system(nn, elements, Ke, use_sparse=False):
    """
    nn: Total number of global nodes
    elements: List of node-pair tuples, e.g., [(0, 1), (1, 2), (2, 3)]
    Ke: Element stiffness matrix (assumed identical for all elements)
    """
    # 1. Initialize the global stiffness matrix
    if use_sparse:
        K = sp.lil_matrix((nn, nn), dtype=float)
    else:
        K = np.zeros((nn, nn), dtype=float)
        
    # 2. Iterate through elements
    for element_nodes in elements:
        indices = list(element_nodes)
        idx_grid = np.ix_(indices, indices)
        
        # 3. Apply Scatter-Add Logic
        if use_sparse:
            # Sparse requires extraction, addition, and reassignment
            K[idx_grid] = K[idx_grid] + Ke
        else:
            # Dense allows direct in-place addition
            K[idx_grid] += Ke
            
    return K

if __name__ == "__main__":
    # Define a 3-element bar connected in series: Node 0 -> 1 -> 2 -> 3
    nodes_total = 4
    topology = [(0, 1), (1, 2), (2, 3)]
    
    # Define a generic 2x2 local element stiffness matrix
    k_local = np.array([[ 100.0, -100.0],
                        [-100.0,  100.0]])
    
    # Execute Dense Assembly
    K_dense = assemble_system(nodes_total, topology, k_local, use_sparse=False)
    print("--- Dense Assembly Result ---")
    print(K_dense)
    print("\n--- Sparse Assembly Result (CSR format) ---")
    
    # Execute Sparse Assembly
    K_sparse_lil = assemble_system(nodes_total, topology, k_local, use_sparse=True)
    K_sparse_csr = K_sparse_lil.tocsr() # Convert to CSR for solving
    print(K_sparse_csr.toarray())