Accuracy and Performance ======================== The data reported in this section is solely intended for informative purpose. These were obtained using QUICK-21.03. The code is continuously being improved. QUICK-23.08 is over three times faster than QUICK-20.03 and about 2.5 times as fast as QUICK-21.03. Readers shall not use the data presented here for comparison with other quantum chemical codes. If you are interested in doing so, we highly encourage you to download the latest QUICK version, compile, and perform your own benchmarks. Accuracy of energies and gradients ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We have compared energies and gradients computed by QUICK with values computed by other quantum chemical packages. HF energies and gradients have displayed accuracies of 1.0E-6 Hartree and 1.0E-4 Hartree/Bohr or better, respectively, for test systems (see `https://github.com/merzlab/QUICK-tests `_ for test cases). DFT energies and gradients have shown similar accuracies in most cases, however, we have observed larger deviations for some molecular systems. Such deviations usually arise due to differences in the exchange correlation quadrature grid. Performance of QUICK CUDA single GPU and MPI parallel versions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Benchmark data obtained with QUICK-21.03**. The code is continuously being improved. **QUICK-23.08 is about 2.5 times faster**. The following graph gives an idea about the performance for a single point SCF + gradient calculation that can be expected with **QUICK-21.03** for a relatively large molecule and reasonably sized basis set. We have used **conservative SCF convergence criteria and integral thresholds**. With these settings, **a B3LYP/6-31G\*\* SCF + gradient calculation of valinomycin (168 atoms) takes only about 8 minutes** on a modern A100 GPU. Real world applications typically require less stringent accuracy and thus require less time to solution. Performance on gaming GPUs is also excellent given their price point. .. image:: bench1.png :width: 650px :align: center :height: 460px :alt: bench1 Performance of QUICK MPI+CUDA version ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The distributed multi-GPU implementation of QUICK utilizes the Message Passing Interface (MPI). In particular for larger calculations the code shows excellent scalability, and thus it makes sense to perform calculations with multiple GPUs if time-to-solution is of importance. **A B3LYP/6-31G\*\* single point SCF + gradient calculation for the entire Crambin protein (642 atoms) can be performed in under 10 minutes using QUICK-21.03** on 16 V100 GPUs. .. image:: bench2.png :width: 1067px :align: center :height: 450px :alt: bench2 See the following paper for more benchmarks of QUICK multi-GPU version: `Manathunga, M.; Jin, C; Cruzeiro, V.W.D.; Miao, Y.; Mu, D.; Arumugam, K.; Keipert, K.; Aktulga, H.M.; Merz, K.M.; Götz, A.W. Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program, J. Chem. Theory Comput. 2021, 17, 7, 3955–3966. `_. *Last updated by Andreas Goetz on 04/25/2024.*