--- jupytext: text_representation: extension: .md format_name: myst kernelspec: display_name: Python 3 language: python name: python3 --- # Core Assembly in Finite Element Analysis ## Introduction Finite element assembly is the step that turns element-level quantities into the global linear system solved by `femlabpy`. This chapter provides the rigorous mathematical foundation for the assembly process, deriving it from variational principles. --- ## 1. Variational Foundation ### 1.1 Principle of minimum potential energy The finite element method is rooted in the principle of minimum potential energy. For an elastic body, the total potential energy functional is: $$ \Pi(\mathbf{u}) = U(\mathbf{u}) - W(\mathbf{u}) $$ where: - $U(\mathbf{u}) = \frac{1}{2}\int_\Omega \boldsymbol{\varepsilon}^T \mathbf{D} \boldsymbol{\varepsilon} \, dV$ is the strain energy - $W(\mathbf{u}) = \int_\Omega \mathbf{u}^T \mathbf{b} \, dV + \int_{\Gamma_t} \mathbf{u}^T \mathbf{t} \, dS$ is the work of external forces The equilibrium configuration minimizes $\Pi$: $$ \delta \Pi = 0 \quad \forall \, \delta\mathbf{u} \text{ admissible} $$ ### 1.2 Weak form derivation Taking the first variation of the potential energy: $$ \delta\Pi = \int_\Omega \delta\boldsymbol{\varepsilon}^T \boldsymbol{\sigma} \, dV - \int_\Omega \delta\mathbf{u}^T \mathbf{b} \, dV - \int_{\Gamma_t} \delta\mathbf{u}^T \mathbf{t} \, dS = 0 $$ This is the **weak form** or **principle of virtual work**: $$ \int_\Omega \delta\boldsymbol{\varepsilon}^T \boldsymbol{\sigma} \, dV = \int_\Omega \delta\mathbf{u}^T \mathbf{b} \, dV + \int_{\Gamma_t} \delta\mathbf{u}^T \mathbf{t} \, dS $$ **Proof of equivalence to strong form:** Using the relation $\delta\boldsymbol{\varepsilon} = \mathbf{L}\delta\mathbf{u}$ where $\mathbf{L}$ is the differential operator, and integrating by parts: $$ \int_\Omega \delta\mathbf{u}^T \mathbf{L}^T \boldsymbol{\sigma} \, dV = -\int_\Omega \delta\mathbf{u}^T (\nabla \cdot \boldsymbol{\sigma}) \, dV + \int_{\Gamma} \delta\mathbf{u}^T (\boldsymbol{\sigma} \cdot \mathbf{n}) \, dS $$ Since $\delta\mathbf{u}$ is arbitrary, this implies: - $\nabla \cdot \boldsymbol{\sigma} + \mathbf{b} = \mathbf{0}$ in $\Omega$ (equilibrium) - $\boldsymbol{\sigma} \cdot \mathbf{n} = \mathbf{t}$ on $\Gamma_t$ (traction BC) ### 1.3 Galerkin discretization We approximate the displacement field using shape functions: $$ \mathbf{u}(\mathbf{x}) \approx \mathbf{u}^h(\mathbf{x}) = \sum_{i=1}^{n} N_i(\mathbf{x}) \mathbf{d}_i = \mathbf{N}(\mathbf{x}) \mathbf{d} $$ The Galerkin method requires: $$ \delta\mathbf{u}^h \in \text{span}\{N_1, N_2, \ldots, N_n\} $$ Substituting into the weak form and using $\delta\boldsymbol{\varepsilon} = \mathbf{B}\delta\mathbf{d}$: $$ \delta\mathbf{d}^T \left[ \int_\Omega \mathbf{B}^T \mathbf{D} \mathbf{B} \, dV \right] \mathbf{d} = \delta\mathbf{d}^T \left[ \int_\Omega \mathbf{N}^T \mathbf{b} \, dV + \int_{\Gamma_t} \mathbf{N}^T \mathbf{t} \, dS \right] $$ Since this must hold for all $\delta\mathbf{d}$: $$ \mathbf{K}\mathbf{d} = \mathbf{F} $$ where $\mathbf{K} = \int_\Omega \mathbf{B}^T \mathbf{D} \mathbf{B} \, dV$ and $\mathbf{F}$ is the load vector. --- ## 2. Assembly Identity and Derivation The global stiffness matrix $\mathbf{K}$ and global internal force vector $\mathbf{q}$ are constructed by summing the contributions from all $N_e$ elements in the mesh. Let the global displacement vector be $\mathbf{U}$ (size $N \times 1$) and the local displacement vector for element $e$ be $\mathbf{u}_e$ (size $n_e \times 1$). ### Boolean connectivity matrix The relationship between the global degrees of freedom and the local degrees of freedom for an element $e$ can be written with a Boolean connectivity matrix $\mathbf{L}_e$. Its role is to pick the global entries that belong to the element. The gather operation for local displacements $\mathbf{u}_e$ is: $$ \mathbf{u}_e = \mathbf{L}_e \mathbf{U} $$ where the entries of $\mathbf{L}_e$ are defined as: $$ (\mathbf{L}_e)_{ij} = \begin{cases} 1 & \text{if local DOF } i \text{ corresponds to global DOF } j \\ 0 & \text{otherwise} \end{cases} $$ Because each local DOF maps to exactly one global DOF, each row of $\mathbf{L}_e$ contains exactly one `1` and all other entries are `0`. The transpose $\mathbf{L}_e^T$ performs the reverse operation. It scatters element contributions back into the global array. That is the algebraic origin of the assembly operation used in software. ### Strain energy and the assembly operator To derive the global assembly equation for the stiffness matrix, consider the total strain energy of the system, $U_{\text{total}}$, which is the sum of the strain energies of the individual elements: $$ U_{\text{total}} = \sum_{e=1}^{N_e} U_e = \sum_{e=1}^{N_e} \frac{1}{2} \mathbf{u}_e^T \mathbf{K}_e \mathbf{u}_e $$ Substitute the Boolean mapping $\mathbf{u}_e = \mathbf{L}_e \mathbf{U}$ into the energy equation: $$ U_{\text{total}} = \sum_{e=1}^{N_e} \frac{1}{2} (\mathbf{L}_e \mathbf{U})^T \mathbf{K}_e (\mathbf{L}_e \mathbf{U}) = \frac{1}{2} \mathbf{U}^T \left( \sum_{e=1}^{N_e} \mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e \right) \mathbf{U} $$ The total strain energy can also be expressed directly in terms of the global stiffness matrix $\mathbf{K}$: $$ U_{\text{total}} = \frac{1}{2} \mathbf{U}^T \mathbf{K} \mathbf{U} $$ Comparing the two expressions yields the fundamental equation for global stiffness assembly: $$ \mathbf{K} = \sum_{e=1}^{N_e} \mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e $$ ### Concrete example Let us ground this with a highly specific example. Consider a 1D bar partitioned into 2 elements (3 nodes total: nodes 1, 2, 3), where each node has 1 DOF. The global system has $N=3$ DOFs. Element 1 connects nodes 1 and 2. Element 2 connects nodes 2 and 3. The Boolean matrix for Element 1, $\mathbf{L}_1$, maps the $2$ local DOFs of the element to the $3$ global DOFs: $$ \mathbf{L}_1 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} $$ The transpose $\mathbf{L}_1^T$ (the "scatter" matrix) is: $$ \mathbf{L}_1^T = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} $$ Let the element stiffness matrix be: $$ \mathbf{K}_1 = \begin{bmatrix} k_{11} & k_{12} \\ k_{21} & k_{22} \end{bmatrix} $$ Now, evaluate the matrix product $\mathbf{L}_1^T \mathbf{K}_1 \mathbf{L}_1$: 1. **First multiplication $\mathbf{K}_1 \mathbf{L}_1$:** $$ \mathbf{K}_1 \mathbf{L}_1 = \begin{bmatrix} k_{11} & k_{12} \\ k_{21} & k_{22} \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} = \begin{bmatrix} k_{11} & k_{12} & 0 \\ k_{21} & k_{22} & 0 \end{bmatrix} $$ 2. **Second multiplication $\mathbf{L}_1^T (\mathbf{K}_1 \mathbf{L}_1)$:** $$ \mathbf{L}_1^T (\mathbf{K}_1 \mathbf{L}_1) = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} k_{11} & k_{12} & 0 \\ k_{21} & k_{22} & 0 \end{bmatrix} = \begin{bmatrix} k_{11} & k_{12} & 0 \\ k_{21} & k_{22} & 0 \\ 0 & 0 & 0 \end{bmatrix} $$ The operation $\mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e$ mathematically expands a small $n_e \times n_e$ matrix into an $N \times N$ matrix, padding it with zeros everywhere except at the specific rows and columns corresponding to the element's global DOFs. The sum $\sum \mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e$ then superimposes all these padded matrices. --- ## Python implementation with `numpy.ix_` While the equations above are mathematically elegant, constructing large, mostly empty $\mathbf{L}_e$ matrices and performing the full matrix multiplications is computationally intractable ($\mathcal{O}(N^3)$ operations for mostly zeros). In practice, the operation $\mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e$ is realized as an array slicing operation: "scatter the entries of $\mathbf{K}_e$ into the appropriate rows and columns of $\mathbf{K}$". In Python, this is executed using `numpy.ix_`. ### Why `numpy.ix_` is the right tool Advanced indexing in NumPy allows you to select arbitrary items based on their N-dimensional index. If you have an element whose local DOFs map to global DOFs `[0, 2]`, you want to add the $2 \times 2$ matrix $\mathbf{K}_e$ to rows `[0, 2]` and columns `[0, 2]` of $\mathbf{K}$. If you simply write `K[[0, 2], [0, 2]]`, NumPy performs *integer array indexing*, pairing the indices up: it selects `K[0, 0]` and `K[2, 2]`, resulting in a 1D array of two elements. This is **not** a submatrix (block) selection! To select the $2 \times 2$ block formed by the intersection of rows `[0, 2]` and columns `[0, 2]`, you need an open mesh grid. `numpy.ix_` takes 1D sequences and returns a tuple of N-dimensional arrays that broadcast to form the full block. ```python import numpy as np indices = [0, 2] ix_grid = np.ix_(indices, indices) print(ix_grid) # Output: # (array([[0], # [2]]), # array([[0, 2]])) ``` Notice the shapes: * The row index array has shape `(2, 1)`: `[[0], [2]]` * The column index array has shape `(1, 2)`: `[[0, 2]]` When NumPy uses these arrays to index `K`, the shapes broadcast together to form a `(2, 2)` grid: * Row 0 with Col 0 $\rightarrow (0, 0)$ * Row 0 with Col 2 $\rightarrow (0, 2)$ * Row 2 with Col 0 $\rightarrow (2, 0)$ * Row 2 with Col 2 $\rightarrow (2, 2)$ This precisely targets the block entries corresponding to the selected global DOFs and matches the mathematical result of $\mathbf{L}_e^T \mathbf{K}_e \mathbf{L}_e$. --- ## Sparse and dense assembly in `assmk` When accumulating $\mathbf{K}_e$ into $\mathbf{K}$, the syntax differs slightly based on whether $\mathbf{K}$ is a dense `numpy.ndarray` or a `scipy.sparse.lil_matrix`. ### Dense arrays A dense matrix stores every coefficient in continuous memory blocks. In Python, an in-place addition to a block using advanced indexing works natively: ```python K[np.ix_(indices, indices)] += Ke ``` This invokes the C-level `__iadd__` routine. Because the memory is contiguous and fully allocated, NumPy computes the exact memory offsets and rapidly adds the values in-place without creating intermediate large array copies. It is blazingly fast for small meshes (e.g., $N < 1000$). ### SciPy sparse matrices Finite element matrices are inherently sparse. For large systems ($N \gg 1000$), storing a dense matrix consumes $\mathcal{O}(N^2)$ memory, which quickly exceeds available RAM. Sparse matrices store only non-zero entries, requiring $\mathcal{O}(N)$ memory. During assembly, SciPy's `lil_matrix` (List of Lists) is typically used. In this format, each row is a Python list of column indices and a corresponding list of values. For sparse arrays, augmented assignment with advanced indexing is problematic. The expression: ```python K[np.ix_(indices, indices)] += Ke ``` fails or is highly inefficient for `lil_matrix` because `+=` attempts to mutate the extracted sparse slice in-place, which isn't robustly supported by SciPy's sparse advanced indexing mechanics. Instead, we must write: ```python K[np.ix_(indices, indices)] = K[np.ix_(indices, indices)] + Ke ``` **What happens here step-by-step?** 1. **Extract:** `K[np.ix_(indices, indices)]` extracts the sparse block, instantiating it temporarily as a dense or sparse sub-matrix. 2. **Add:** `+ Ke` adds the dense element matrix to the extracted block. 3. **Assign:** The `=` operator calls `__setitem__` on the `lil_matrix`. The LIL matrix updates its internal lists. If a zero entry suddenly becomes non-zero, it dynamically appends the new column index and value to the corresponding row list. While allocating lists during assignment has some overhead, it prevents the quadratic memory blow-up of a dense matrix and keeps the workflow practical for large meshes. After assembly is complete, the `lil_matrix` is typically converted to Compressed Sparse Row (`csr_matrix`) format for fast solving. --- ## Complete example: 3-element manual assembly Below is a self-contained, 50-line runnable script demonstrating everything discussed: Boolean matrices via `ix_`, dense vs. sparse logic, and a 3-element system (4 nodes, 1 DOF per node). ```python import numpy as np import scipy.sparse as sp def assemble_system(nn, elements, Ke, use_sparse=False): """ nn: Total number of global nodes elements: List of node-pair tuples, e.g., [(0, 1), (1, 2), (2, 3)] Ke: Element stiffness matrix (assumed identical for all elements) """ # 1. Initialize the global stiffness matrix if use_sparse: K = sp.lil_matrix((nn, nn), dtype=float) else: K = np.zeros((nn, nn), dtype=float) # 2. Iterate through elements for element_nodes in elements: indices = list(element_nodes) idx_grid = np.ix_(indices, indices) # 3. Apply Scatter-Add Logic if use_sparse: # Sparse requires extraction, addition, and reassignment K[idx_grid] = K[idx_grid] + Ke else: # Dense allows direct in-place addition K[idx_grid] += Ke return K if __name__ == "__main__": # Define a 3-element bar connected in series: Node 0 -> 1 -> 2 -> 3 nodes_total = 4 topology = [(0, 1), (1, 2), (2, 3)] # Define a generic 2x2 local element stiffness matrix k_local = np.array([[ 100.0, -100.0], [-100.0, 100.0]]) # Execute Dense Assembly K_dense = assemble_system(nodes_total, topology, k_local, use_sparse=False) print("--- Dense Assembly Result ---") print(K_dense) print("\n--- Sparse Assembly Result (CSR format) ---") # Execute Sparse Assembly K_sparse_lil = assemble_system(nodes_total, topology, k_local, use_sparse=True) K_sparse_csr = K_sparse_lil.tocsr() # Convert to CSR for solving print(K_sparse_csr.toarray()) ```