Project 8: Molecular Dynamics — Capstone
Tip. Every highlighted link in this page is clickable. For fast navigation use Table of Contents
Author: Nels Buhrley
Language: C++17 with OpenMP · Python 3 (visualization)
HPC: Run on the BYU Supercomputer (8 CPUs via SLURM)
Build: make release — see Build & Run
Snapshot
- Built a full 2D molecular dynamics engine in C++17 from scratch (Lennard-Jones physics, velocity Verlet integrator, configurable boundary conditions, gravity, and scripted heating/cooling).
- Solved a non-trivial parallelization problem in an $O(N^2)$ force kernel using OpenMP thread-local accumulators plus guided scheduling to keep updates race-free and scalable.
- Shipped production-style outputs for analysis and demos: structured NumPy/CSV artifacts, automated energy diagnostics, and parallel-rendered MP4 animations.
- Designed for reproducible compute workflows: config-driven runs (
simulation_1.cfg/simulation_2.cfg), release/debug/PGO build modes, benchmark auto-tuning, and SLURM execution viajob.sh. - Demonstrates end-to-end engineering: numerical methods, performance optimization, HPC deployment, and polished communication of results.
For fast verification, jump to Results, Build & Run, and Simulation Parameters.
Table of Contents
- Snapshot
- Overview
- Physics Background
- Code Structure
- Parallelization Narrative
- Visualization (
plotting.py) - Results
- Sources of Error
- Build & Run
- Simulation Parameters
- Key Techniques
- Project Structure
Overview
This project implements a 2D Molecular Dynamics simulation of interacting particles governed by the Lennard-Jones potential. The system evolves via the velocity Verlet integration algorithm with periodic boundary conditions, modeling the thermodynamic behavior of a simple fluid. An external energy injection/extraction schedule drives the system through heating and cooling phases, allowing observation of phase-like transitions in real time.
The implementation combines OpenMP parallelism for force calculations, minimum image convention for periodic boundaries, and a cell-based Verlet neighbor list to reduce unnecessary pair checks. Results are saved as .npy and .csv files, with Python tooling that produces parallel-rendered particle animations and comprehensive energy diagnostics.
Physics Background
The Lennard-Jones Potential
Particles interact via the Lennard-Jones (12-6) potential in reduced units ($\sigma = \varepsilon = m = 1$):
\[U(r) = 4\varepsilon\left[\left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^{6}\right]\]The steep $r^{-12}$ repulsion models short-range Pauli exclusion, while the $r^{-6}$ attraction captures van der Waals dispersion forces. A cutoff radius $r_c = 2.5\,\sigma$ truncates the potential for computational efficiency — contributions beyond $r_c$ are negligible.
Equations of Motion
Each particle obeys Newton’s second law:
\[m\,\ddot{\mathbf{r}}_i = \mathbf{F}_i = -\nabla_i \sum_{j \neq i} U(r_{ij})\]The force on particle $i$ due to particle $j$ is:
\[\mathbf{F}_{ij} = \frac{24\varepsilon}{r^2}\left[2\left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^{6}\right]\mathbf{r}_{ij}\]Newton’s third law ($\mathbf{F}{ij} = -\mathbf{F}{ji}$) halves the number of pair evaluations.
Velocity Verlet Integration
The simulation advances positions and velocities using the velocity Verlet scheme, which is time-reversible and symplectic:
\[\mathbf{v}_i\left(t + \tfrac{\Delta t}{2}\right) = \mathbf{v}_i(t) + \tfrac{\Delta t}{2}\,\mathbf{a}_i(t)\] \[\mathbf{r}_i(t + \Delta t) = \mathbf{r}_i(t) + \mathbf{v}_i\left(t + \tfrac{\Delta t}{2}\right)\,\Delta t\] \[\mathbf{v}_i(t + \Delta t) = \mathbf{v}_i\left(t + \tfrac{\Delta t}{2}\right) + \tfrac{\Delta t}{2}\,\mathbf{a}_i(t + \Delta t)\]This two-stage velocity update requires force evaluation only once per time step.
Observables
| Observable | Definition |
|---|---|
| Temperature | $T = \frac{1}{N d}\sum_i m v_i^2$ (equipartition theorem, $k_B = 1$) |
| Kinetic Energy | $K = \frac{1}{2}\sum_i m v_i^2$ |
| Potential Energy | $U = \sum_{i<j} U_\text{LJ}(r_{ij})$ |
| Total Energy | $E = K + U$ |
Code Structure
All simulation logic lives in two files:
| File | Role |
|---|---|
main.cpp |
Parses config, runs simulation, and supports --benchmark tuning sweeps |
processing.h |
MolucularSystem class: integration, neighbor-list force evaluation, energy tracking, and I/O |
MolucularSystem Class
Each MolucularSystem instance represents a complete simulation state for $N$ particles in a 2D periodic box.
Construction
MolucularSystem(config)
- Positions are integrated in
currPositionsand sampled intosampledPositionseverystepSkipsteps to keep output sizes manageable. - Velocities and accelerations are single-frame arrays of size
numParticles. - All particles start at rest except particle 0, which receives a small initial velocity
{0.001, 0.001}to break symmetry and seed energy flow. - The dimensionality is compile-time configurable via
static constexpr int d = 2.
Constants (Reduced Units)
| Constant | Value | Description |
|---|---|---|
d |
2 | Spatial dimensions |
rc |
2.5 σ | Lennard-Jones cutoff radius |
rc2 |
6.25 σ² | Precomputed $r_c^2$ to avoid square roots |
calculateAccelerations(t) — Force Calculation
Computes Lennard-Jones forces using a Verlet neighbor list with linked-cell binning:
if (shouldRebuildNeighborList(currPositions))
rebuildNeighborList(currPositions);
for (p1 = 0; p1 < N; p1++)
for (p2 in neighbors[p1])
dx = p1.x - p2.x
dx -= L * round(dx / L) // minimum image convention
r2 = dx*dx + dy*dy
if (r2 > rc2) continue // force cutoff
invR2 = 1.0 / r2
invR6 = invR2 * invR2 * invR2
F = 24.0 * (2.0 * invR6 * invR6 - invR6) * invR2
a[p1] += F; a[p2] -= F // Newton's third law
Parallelism: When numParticles > 50, the loop over particles is parallelized with OpenMP. Each thread accumulates into thread-local acceleration/potential buffers and the buffers are reduced after the parallel region.
Minimum image convention: Periodic boundary conditions are enforced by shifting displacements to the nearest image: dx -= L * round(dx / L).
verletStep(t) — Velocity Verlet Integration
Each call advances the system by one time step $\Delta t$:
- Half-kick: Update velocities by half a time step using current accelerations
- Drift: Update positions using the half-stepped velocities
- Wrap: Apply periodic boundary conditions via
pos -= L * floor(pos / L) - Recalculate forces: Call
calculateAccelerations(t)at the new positions - Half-kick: Complete the velocity update with the new accelerations
- Energy bookkeeping: Call
energyCalculations(t)
Both velocity loops are parallelized with #pragma omp parallel for when numParticles > 100.
energyCalculations(t) — Energy Tracking & Injection
At each time step, the method:
- Checks if the current step matches the next entry in
energyFunction— if so, rescales all velocities by a factor $\lambda$:
This uniformly scales kinetic energy to inject or remove a precise amount of energy.
- Records temperature, potential energy, kinetic energy, and total energy for postprocessing.
Energy Schedule: buildEnergyFunction()
The energy injection/extraction schedule is:
| Phase | Time Steps | Action |
|---|---|---|
| 0 – 5% | Settle | No energy change |
| 5 – 50% | Heating | Inject energy at a constant rate |
| 50 – 95% | Cooling | Remove energy at a constant rate |
| 95 – 100% | Settle | No energy change |
This drives the system through a full heating-cooling cycle, allowing observation of melting and crystallization behavior.
runSimulation() — Main Loop
calculateAccelerations(0); // initial forces
energyCalculations(0); // initial energy bookkeeping
for (t = 1; t < timeSteps; t++)
verletStep(t); // advance the system
Progress is logged every 5% of the simulation.
Output Methods
| Method | Output |
|---|---|
saveResultsToNpy() |
Separate .npy arrays for sampled positions, energies, temperatures, and metadata |
saveEnergyToCSV() |
CSV with columns: TimeStep, Temperature, PE, KE, TE |
savePositionsToCSV() |
CSV with columns: TimeStep, ParticleIndex, X, Y |
save() |
Auto-creates output/out_N/ (incrementing index) and writes NPY + CSV outputs |
Parallelization Narrative
The O(N²) Challenge
The baseline all-pairs Lennard-Jones loop is $O(N^2)$, which becomes expensive quickly for dense systems and long trajectories. The current code reduces this cost by rebuilding a Verlet neighbor list only when particles move enough to invalidate the list.
This turns the hot loop into “iterate only nearby particles” instead of “iterate every particle pair,” while preserving exact pairwise force updates inside the cutoff radius.
Race Conditions in Force Accumulation
Race hazards still exist because Newton’s third law updates two particles per interaction. The implementation avoids shared-write races by assigning each thread a private acceleration buffer:
for (int p1 = 0; p1 < N; p1++)
for (int p2 : neighborList[p1])
F = compute_LJ_force(p1, p2);
localAcceleration[p1] += F;
localAcceleration[p2] -= F;
Each thread writes only to its own local arrays during force traversal; shared arrays are combined in a deterministic reduction pass after the parallel loop.
The Thread-Local Accumulator Solution
The current solution combines two optimizations:
- Linked-cell neighbor-list rebuilds with configurable
neighborSkin. - Thread-local force accumulation with post-loop buffer summation.
This structure keeps force evaluation race-free while reducing pair checks significantly for realistic densities.
#pragma omp parallel if (numParticles > 50)
{
// Each thread writes to its own acceleration buffer.
std::vector<std::array<double, d>> localAccelerations = threadAccelerations[tid];
double localPE = 0.0;
#pragma omp for schedule(guided)
for (int p1 = 0; p1 < numParticles; p1++) {
for (int p2 : neighborList[p1]) {
// ... force calculation ...
localAccelerations[p1][dim] += force_component;
localAccelerations[p2][dim] -= force_component; // safe: thread-private
}
}
}
// After the parallel region, reduce thread buffers into shared accelerations.
The trade-off is extra memory for per-thread acceleration arrays, but it removes fine-grained synchronization overhead and scales well on shared-memory nodes.
Guided Scheduling for Triangular Loops
Even with neighbor lists, work per particle is non-uniform because local density varies. schedule(guided) is used so overloaded threads can shed work dynamically:
#pragma omp for schedule(guided)
for (int p1 = 0; p1 < numParticles; p1++) { ... }
This improves balance for both dense startup lattices and later disordered states.
Lessons Learned
-
Neighbor lists are the biggest practical speedup. Rebuilding occasionally is cheaper than checking all pairs each step.
-
Thread-local buffers remove write races cleanly. Newton’s third-law updates stay local during parallel work and are reduced afterward.
-
Guided scheduling remains important. Density-driven imbalance appears even when pair loops are no longer strictly triangular.
-
Auto-tuning matters on shared clusters. The benchmark mode (
--benchmark) sweeps threads andneighborSkinto select better run settings per node.
Visualization (plotting.py)
The Python pipeline performs two tasks with separate scripts:
1. Particle Animation (Parallel Rendering)
The animation is rendered in parallel using multiprocessing:
- Frame indices are split across configurable workers (
--workers, default based on CPU count) - Each worker renders its segment to a temporary
.mp4viamatplotlib.animation - Segments are stitched with
ffmpeg -concatinto a single animation
This avoids the serial bottleneck of rendering thousands of frames on one core.
2. Energy Analysis Plots
plot_energy.py creates summary and diagnostic figures:
| Panel | Content |
|---|---|
| Composite Figure | Energy components, drift, temperature, and temperature-vs-energy scatter |
| Detailed Figure | Binned temperature-over-energy trend plus raw samples |
Results
Particle Animation
Particle trajectory animation showing the heating-cooling cycle. Particles evolve from a regular grid through a gas-like disordered phase during heating, then re-order as energy is extracted, illustrating the connection between kinetic energy and temperature at the microscopic level.
Energy Analysis
The energy plot shows the heating/cooling cycle: kinetic energy rises during the injection phase, plateaus, and falls during extraction. Total energy drift remains bounded, validating the Verlet integrator’s symplectic properties.
Sources of Error
| Source | Nature | Mitigation |
|---|---|---|
| Time-step discretization | Finite $\Delta t$ introduces integration error | Use smaller finalTime / timeSteps; Verlet is $O(\Delta t^2)$ per step |
| Cutoff truncation | Ignoring interactions beyond $r_c$ introduces a systematic energy offset | $r_c = 2.5\sigma$ captures $>98\%$ of the LJ well; could add tail corrections |
| Finite-size effects | Small $N$ or small box $L$ affects thermodynamic behavior | Increase particle count and box size |
| Velocity rescaling | Instantaneous rescaling is not thermostatting — not canonical ensemble | Acceptable for driven heating/cooling; use Nosé-Hoover for equilibrium studies |
| Initial symmetry | Grid start with zero velocities is artificial | Small perturbation on particle 0 breaks symmetry; long heating phase equilibrates |
Computational complexity: Force evaluation is neighbor-list based: typical runtime is closer to $\mathcal{O}(N \times n_{nbr} \times T)$ with periodic $\mathcal{O}(N)$ rebuilds, while worst-case behavior remains $\mathcal{O}(N^2 \times T)$ in very dense/pathological layouts.
Build & Run
Prerequisites
- C++17 compiler (
g++orclang++) - OpenMP (
brew install libompon macOS) - zlib (for
cnpyI/O linkage) - Python 3 with
numpyandmatplotlib - ffmpeg (for animation rendering)
Build Targets
make release # Optimized build (-O3, LTO, vectorization, march=native)
make unsafe # Adds -ffast-math (may introduce minor FP drift)
make debug # -O0, full warnings, OpenMP disabled
make profile-gen && ./bin/main simulation_2.cfg && make profile-use # Profile-guided optimization
Run
./bin/main simulation_2.cfg
./bin/main simulation_2.cfg --benchmark
Output is auto-saved to output/out_N/ (incrementing index):
| File | Description |
|---|---|
positions.npy |
Position history as (sampledSteps, N, 2) sampled every stepSkip solver steps |
temperatures.npy |
Temperature time series |
potentialEnergies.npy |
Potential energy time series |
kineticEnergies.npy |
Kinetic energy time series |
totalEnergies.npy |
Total energy time series |
metadata.npy |
[width, height, numParticles, timeSteps, finalTime] |
energy_data.csv |
Sampled energy/temperature CSV |
positions_data.csv |
Sampled per-particle trajectory CSV |
Configuration (simulation_1.cfg / simulation_2.cfg)
The simulation is fully configured through .cfg files using key=value pairs.
Required keys currently parsed by main.cpp/processing.h:
| Key | Type | Meaning |
|---|---|---|
timeSteps |
int | Number of integration steps |
finalTime |
float | Physical end time; dt = finalTime / timeSteps |
width |
float | Box width |
height |
float | Box height |
xBoundaryCondition |
string | periodic or reflective |
yBoundaryCondition |
string | periodic or reflective |
gravity |
float | Constant acceleration in -y |
stepSkip |
int | Sampling stride for stored positions/CSV output |
neighborSkin |
float | Verlet neighbor-list skin distance |
showProgress |
int (optional) | Progress-print toggle (0 off, 1 on; default on) |
initialPositionsInstructions |
string | Shape instructions for initial particle set |
EnergyInstructions |
string | Time-segmented heating/cooling schedule |
initialPositionsInstructions format:
Shape,cx,cy,rotation,spacing,size[,...];Shape,cx,cy,rotation,spacing,size[,...]
- Supported shapes:
Rectangle,Hexagon,Rhombus,Triangle,Circle - Rectangle uses 7 fields:
Rectangle,cx,cy,rotation,spacing,numX,numY - Other shapes use 6 fields:
Shape,cx,cy,rotation,spacing,size
EnergyInstructions format:
startPercent,endPercent,totalEnergyChangePer100Particles;...
Each segment distributes its total energy change uniformly over that percent range.
Visualize
python3 plotting.py --workers 8
python3 plot_energy.py --run 0
plotting.py automatically loads the newest output/out_N/ directory and writes:
md_animation_N.mp4
plot_energy.py automatically loads either the latest or requested run and writes:
energy_analysis_N.pngtemperature_over_energy_N.pngenergy_summary_N.csv
Note: plotting.py currently points to a cluster-specific ffmpeg path. If running locally, update the FFMPEG variable or replace it with your system ffmpeg path.
HPC (SLURM)
sbatch job.sh
The batch script builds in release mode only when needed, optionally auto-tunes OMP_NUM_THREADS and neighborSkin via benchmark mode, runs ./bin/main simulation_2.cfg, then submits anamation.sh and plotting.sh as follow-up SLURM jobs.
Simulation Parameters
Configured in simulation_2.cfg (not hardcoded in main.cpp).
| Parameter | Default | Description |
|---|---|---|
initialPositionsInstructions |
Rectangle, 125.0, 30.0, 0.0, 1.1, 200, 50 |
Initial particle layout (shape-composed) |
timeSteps |
1000000 |
Number of integration steps |
finalTime |
2000.0 |
Total simulation time (reduced units) |
width,height |
250.0,150.0 |
2D box dimensions |
xBoundaryCondition,yBoundaryCondition |
periodic,reflective |
Boundary handling by axis |
gravity |
0.01 |
Constant downward acceleration |
neighborSkin |
0.2 |
Skin distance for neighbor-list rebuild policy |
stepSkip |
160 |
Save sampled frames every 160 timesteps |
EnergyInstructions |
2,8,-50;8,50,800;50,60,-600;60,95,-200 |
Multi-stage heating/cooling schedule |
rc |
2.5 σ | Lennard-Jones cutoff radius |
Key Techniques
| Technique | Purpose |
|---|---|
| Velocity Verlet integration | Time-reversible, symplectic $O(\Delta t^2)$ integrator |
| Lennard-Jones potential with cutoff | Realistic short-range interactions with $O(N^2)$ scaling |
| Minimum image convention | Correct nearest-image distances for periodic boundaries |
| Newton’s third law ($\mathbf{F}{ij} = -\mathbf{F}{ji}$) | Halves the number of pair evaluations |
| Verlet neighbor lists + linked-cell grid | Reduces force checks to nearby particles and enables scalable runs |
| Thread-local accumulators + post-loop reduction | Race-free parallel force summation without per-interaction synchronization |
guided OpenMP schedule |
Balances non-uniform per-particle workloads across threads |
| Velocity rescaling energy injection | Precisely controls energy input/output for driven simulations |
Flat std::array<double, d> storage |
Cache-friendly, compile-time dimensionality |
cnpy NPY output |
Stores large runs safely without NPZ ZIP-size limits |
| Parallel animation rendering | Multiprocessing + ffmpeg concat avoids serial matplotlib bottleneck |
| Profile-guided optimization (PGO) | Compiler uses runtime data for branch prediction and inlining |
Project Structure
Project 8: Molicular Dynamics/
|-- main.cpp # Entry point: reads config path and launches simulation
|-- processing.h # MolucularSystem class: integration, forces, energy, I/O
|-- Makefile # Multi-target build: debug, release, unsafe, profile-guided
|-- simulation_1.cfg # Smaller exploratory config
|-- simulation_2.cfg # Primary runtime configuration file
|-- plotting.py # Python visualization: parallel animation renderer
|-- plot_energy.py # Python post-processing: energy diagnostics and summaries
|-- job.sh # SLURM batch script (default: 8 OpenMP CPUs)
|-- bin/ # Compiled executable
|-- slurm_out/ # SLURM output logs from HPC runs
`-- output/
|-- Test Out/ # Early test run
|-- out_0/ -- out_9/ # Simulation runs (auto-incrementing)
| |-- positions.npy # Sampled positions
| |-- temperatures.npy # Temperature time series
| |-- potentialEnergies.npy # Potential energy time series
| |-- kineticEnergies.npy # Kinetic energy time series
| |-- totalEnergies.npy # Total energy time series
| |-- metadata.npy # width, height, particle count, timeSteps, finalTime
| |-- energy_data.csv # Energy & temperature time series (CSV)
| |-- positions_data.csv # Sampled particle trajectories (CSV)
| |-- md_animation_N.mp4 # Particle trajectory animation
| |-- energy_analysis_N.png # Composite energy diagnostics figure
| |-- temperature_over_energy_N.png # Detailed T(E) trend plot
| `-- energy_summary_N.csv # Scalar diagnostics summary
`-- ...
Nels Buhrley — Computational Physics, 2026