"From Distributed Systems to Molecular Insight: Infrastructure, Algorithms, and Organocatalytic Discovery"
Quantum chemistry is ready to become a discovery engine at a scale not previously possible, but realizing this potential requires software, algorithms, and scientific workflows redesigned for modern distributed computing. This dissertation develops that foundation and uses it to move from scalable computation to molecular insight. First, we develop the software and infrastructure primitives to adapt quantum chemistry algorithms to modern computer architectures and demonstrate linear scaling across 100+ GPUs. Second, snapRMSD addresses the structure-analysis bottleneck created by large molecular datasets by using graph automorphism and group-theoretic symmetry decomposition to compute chemically valid, symmetry-aware RMSD values with orders-of-magnitude speedups over existing methods, while also providing a novel description of molecular symmetry, canonical atom ranks, and orbit partitions. Finally, these tools are applied to organocatalytic ring-opening polymerization, where hypothesis-free conformational exploration reveals that catalyst selectivity is encoded in how substrates occupy competing pre-reactive manifolds and that non-reactivity is encoded in how a monomer accesses a distinct off-cycle deactivation pathway. Together, these studies trace a single arc: scalable infrastructure enables large molecular datasets, symmetry-aware algorithms make those datasets tractable, and ensemble-first workflows turn them into chemical discovery.