Cubane (C8H8) VMC

Vladimir_Konjkov · Post by **Vladimir_Konjkov** » Thu Apr 16, 2015 1:21 pm

Hello Mike. I am a professional programmer, but in the last few years as a hobby I have been doing quantum-chemical calculations.
Until recently, in my calculations, I used only the classical quantum-mechanical methods, such as HF, MP2, CI, CC, DFT
and their multireference implementation.
However, these methods have a number of obvious limitations: the "gold standard" of quantum chemistry CCSD(T) and its multireference
extension proposed by Mukherjee scales as N^7, which makes them not suitable for the calculation of large molecules, DFT, in turn,
has no multireference extension that does not allow to compute the energy activation of bond breaking, furthermore exchange correlation in the DFT has incorrect
asymptotics behavior at infinity witch results of wrong energy of the excited state with charge transfer.
All this has led me to the idea to start using QMC calculation methods.

1-st step in this direction for me is to choose the most convenient-to-use QMC-package. Casino is one of contenders.
I decided for myself that the process of choosing the package should consist of passing a few check points (list only the first)..

1. I successfuly load files produced in any of external Gaussian basis set codes to CASINO and check if VMC energy without Jastow optimisation
(exception can be made only for cusp correction) is equal to Hartree-Fock one.

My favority molecule is cubane (C8H8), so I performed the calculation of this molecule in the experimental geometry l(C-C)=1.571, l(C-H)=1.098 and basis def2-SVP with ORCA program.
The output obtained I converted by script CASINO/utils/wfn_converters/orca/molden2qmc.py to gwfn.data.

To disable Jastrow correction to wfn I set use_jastrow : F, values of other parameters are as follows:

Code: Select all

#-------------------#
# CASINO input file #
#-------------------#

# Silane molecule (ground state)

# SYSTEM
neu               : 28             #*! Number of up electrons (Integer)
ned               : 28             #*! Number of down electrons (Integer)
periodic          : F              #*! Periodic boundary conditions (Boolean)
atom_basis_type   : gaussian       #*! Basis set type (text)

# RUN
runtype           : vmc            #*! Type of calculation (Text)
newrun            : T              #*! New run or continue old (Boolean)
testrun           : F              #*! Test run flag (Boolean)
block_time        : 0.0 s          #*! VMC/DMC block time (Physical)

# VMC
vmc_equil_nstep   : 500            #*! Number of equilibration steps (Integer)
vmc_nstep         : 10000000       #*! Number of steps (Integer)
vmc_nblock        : 10             #*! Number of checkpoints (Integer)
vmc_nconfig_write : 0              #*! Number of configs to write (Integer)
vmc_decorr_period : 0              #*! VMC decorrelation period (0 - auto)
psi_s             : slater         #*! Type of [anti]symmetrizing wfn (Text)
complex_wf        : F              #*! Wave function real or complex (Boolean)

# DMC
dmc_equil_nstep   : 2000           #*! Number of steps (Integer)
dmc_equil_nblock  : 1              #*! Number of checkpoints (Integer)
dmc_stats_nstep   : 10000          #*! Number of steps (Integer)
dmc_stats_nblock  : 1              #*! Number of checkpoints (Integer)
dmc_target_weight : 1000.0         #*! Total target weight in DMC (Real)
dtdmc             : 0.01           #*! DMC time step (Real)
use_tmove         : F              #*! Casula nl pp for DMC (Boolean)

# RMC

# OPTIMIZATION
opt_method        : madmin         #*! Opt method (varmin/madmin/emin/...)
opt_cycles        : 6              #*! Number of optimization cycles (Integer)
opt_jastrow       : F              #*! Optimize Jastrow factor (Boolean)
opt_det_coeff     : F              #*! Optimize determinant coeffs (Boolean)
opt_backflow      : F              #*! Optimize backflow parameters (Boolean)
opt_orbitals      : F              #*! Optimize orbital parameters (Boolean)

# GENERAL PARAMETERS
use_jastrow       : F              #*! Use a Jastrow function (Boolean)
backflow          : F              #*! Use backflow corrections (Boolean)
expot             : F              #*! Use external potential (Boolean)
timing_info       : F              #*! Activate subroutine timers (Boolean)
esupercell        : F              #*! Energy/supercell in output (Boolean)
neighprint        : 0              #*! Neighbour analysis (Integer)
mpc_cutoff        : 30.d0 hartree  #*! G vector cutoff for MPC (Physical)
forces            : F              #*! Evaluate forces on atoms (Boolean)
checkpoint        : 1              #*! Checkpoint level (Integer)

# EXPECTATION VALUES
density           : F              #*! Accumulate density (Boolean)
spin_density      : F              #*! Accumulate spin densities (Boolean)
pair_corr         : F              #*! Accumulate rec. space PCF (Boolean)
pair_corr_sph     : F              #*! Accumulate sph. real space PCF (Boolean)
loc_tensor        : F              #*! Accumulate localization tensor (Boolean)
structure_factor  : F              #*! Accumulate structure factor (Boolean)
struc_factor_sph  : F              #*! Accumulate sph. struc. factor (Boolean)
onep_density_mat  : F              #*! Accumulate 1p density matrix (Boolean)
twop_density_mat  : F              #*! Accumulate 2p density matrix (Boolean)
cond_fraction     : F              #*! Accumulate cond fraction (Boolean)
dipole_moment     : F              #*! Accumulate elec. dipole moment (Boolean)
expval_cutoff     : 30.d0 hartree  #*! G vector cutoff for expval (Physical)
permit_den_symm   : F              #*! Symmetrize QMC charge data (Boolean)
qmc_density_mpc   : F              #*! Use QMC density in MPC int (Boolean)

# BLOCK INPUT

My final results is:

ORCA HF-energy (def2-SVP spherical harmonic basis) is 307.165180575428 au (from ORCA output).

VMC energy (au) Standard error Correction for serial correlation

-307.255906030836 +/- 0.002973873096 No correction
-307.255906030836 +/- 0.005879359091 Correlation time method
-307.255906030822 +/- 0.006067861278 On-the-fly reblocking method

Sample variance of E_L (au^2/sim.cell) : 88.374893578007 +- 1.537284407294

My interpretation is that VMC energy differ more than 3 standart errors from ORCA HF-energy, so check fails.

So my assumption is:
1. ORCA gave wrong energy result. This very, very, very unlikely because same energy I get in the NWCHEM and PSI4 and I using ORCA more than 3 years.
2. ORCA cubane.molden has not quite compatible file format and was not correctly interpreted/converted to gwfn.data. Nuances of interpretation ORCA MOLDEN file format widely discussed http://sourceforge.net/p/janpa/wiki/OrcaExamples/
3. I set the wrong parameters for CASINO calculation.
4. I misread the CASINO calculation result.
5. CASINO gave wrong energy/std. error result.

Can somebody check p.2-5 which one is incorrect?

thanks in advance, Vladimir.

Post by **Mike Towler** » Thu Apr 16, 2015 8:01 pm

Dear Vladimir,

Thanks for your detailed and well-expressed query.

The answer is none of 2-5, but in fact (6) 'There is a bug in the external MOLDEN converter when the basis set contains f functions'. The author Mike Deible has been made aware of this and has promised to fix it; however he's in the middle of defending his thesis and is a little short of time right now so I'm not sure when it will be fixed. Let's say 'soon', whatever that might mean.

So - try again with a smaller basis set containing only s, p, d functions and it will probably pass your tests. See the files in examples/generic/gauss_dfg in the CASINO distribution for further discussion - in particular the files README and RESULTS.

That said, according to Mike D's statement in the RESULTS file, the current status of the ORCA interface (implemented through the MOLDEN file) is:

Code: Select all

"The HF d-ane energy is reproduced in VMC from a MOLDEN file that was
generated by a user. However, when a locally generated d-ane file was used,
the HF energy is not reproducible in VMC.  My leading suspicion is that
this is due to a version mismatch, but it requires further investigation.
f-ane energies were not reproducible at all. A VMC calculation on methane
using a DEF2-TZVPP calculation, both sent from a user and produced
locally, reproduces the HF energy, as well as for cc-pvdz and cc-pvtz
basis sets.  One other issue I've noticed is that aug-cc-pvdz or
aug-cc-pvtz do not allow the VMC to reproduce the HF energy. This is
because ORCA does this strange thing where it groups the diffuse functions
together, instead of ordering them by angular momentum.  I don't think
this is a problem with ORCA, because it looks like the AOs are also
ordered in the same manner, but I do think it's a bug in the converter
that I need to figure out.  I am not confident about the conversions
produced with this code just yet, but as always, any user that tries to
use this should run a VMC calculation with no Jastrow factor to ensure
that the VMC energy and HF energy agree to confirm that the conversion was
successful."

It may be that there are issues with the "Nuances of interpretation ORCA MOLDEN file format" to which you refer. If you can help us and Mike D. with this we would be very grateful. Maintaining interfaces with large numbers of external codes that we know very little about is one of the biggest pains about being a QMC developer, as I'm sure you can appreciate.

Of course the need for high angular momentum functions and sophisticated basis sets in general in DMC is much less important than in regular quantum chemistry, as the non-zero bits of the wave function aren't represented in terms of a basis set, but even so, they still need to work!

The reason I know the answer to your problem is because I'm in the middle of a review of the various interfaces to external Gaussian codes and am also making sure that the documentation and so on is improved and up-to-date in time for the forthcoming new release of CASINO.

Cheers,
Mike

PS: you won't be informed of this post by email (as nobody else was notified of your post earlier) since somebody at our hosting company has again tried to block us because the site contains the word 'CASINO' and sends out lots of emails (or something like that).. I've asked them to fix it, and hopefully this will be sorted shortly. I'll add another email to this thread when it's working again so you're notified.

Vladimir_Konjkov · Post by **Vladimir_Konjkov** » Fri Apr 17, 2015 2:08 am

Mike Towler wrote: So - try again with a smaller basis set containing only s, p, d functions and it will probably pass your tests. See the files in examples/generic/gauss_dfg in the CASINO distribution for further discussion - in particular the files README and RESULTS.

Dear Mike, def2-SVP have no f-function in a 'H' and 'C' basis set realy. I just downloaded it for you from https://bse.pnl.gov/bse/portal.

Code: Select all

!  Def2-SVP  EMSL  Basis Set Exchange Library   4/16/15 6:53 PM
! Elements                             References
! --------                             ----------
! H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn : F. Weigend and R. Ahlrichs, Phys. Chem. Chem. Phys., Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence
! quality for H to Rn: Design and assessment of accuracy 7, 3297 (2005).
! 



****
H     0 
S   3   1.00
     13.0107010              0.19682158E-01   
      1.9622572              0.13796524       
      0.44453796             0.47831935       
S   1   1.00
      0.12194962             1.0000000        
P   1   1.00
      0.8000000              1.0000000        
****
C     0 
S   5   1.00
   1238.4016938              0.54568832082E-02      
    186.29004992             0.40638409211E-01      
     42.251176346            0.18025593888    
     11.676557932            0.46315121755    
      3.5930506482           0.44087173314    
S   1   1.00
      0.40245147363          1.0000000        
S   1   1.00
      0.13090182668          1.0000000        
P   3   1.00
      9.4680970621           0.38387871728E-01      
      2.0103545142           0.21117025112    
      0.54771004707          0.51328172114    
P   1   1.00
      0.15268613795          1.0000000        
D   1   1.00
      0.8000000              1.0000000        
****
! Elements                             References
! --------                             ----------
! Y-Cd(ecp-28), Hf-Hg(ecp-46): D. Andrae,U. Haeussermann, M.Dolg, H.Stoll, H.Preuss, Theor.Chim.Acta, 1990, 77, 123?141.
! In-Sb(ecp-28), Tl-Bi(ecp-46): B. Metz, H. Stoll, M. Dolg, J. Chem. Phys., 2000, 113, 2563?2569.
! Te-Xe(ecp-28), Po-Rn(ecp-46): K. A. Peterson, D. Figgen, E. Goll, H. Stoll, M. Dolg, J. Chem. Phys., 2003, 119, 11113?11123.
! Rb(ecp-28), Cs(ecp-46): T.Leininger, A.Nicklass, W.Kuechle, H.Stoll, M.Dolg, A.Bergner, Chem. Phys. Lett., 1996, 255, 274?280.
! Sr(ecp-28), Ba(ecp-46): M. Kaupp, P. V. Schleyer, H. Stoll and H. Preuss, J. Chem. Phys., 1991, 94, 1360?1366.
!

I've try also STO-3G basis, the difference in ORCA HF-energy and VMC one is very large.

Vladimir_Konjkov · Post by **Vladimir_Konjkov** » Fri Apr 17, 2015 4:21 am

Mike Towler wrote: If you can help us and Mike D. with this we would be very grateful. Maintaining interfaces with large numbers of external codes that we know very little about is one of the biggest pains about being a QMC developer, as I'm sure you can appreciate.

If the interface is well defined it usually does not cause problems. As SIMPLIFIED INTRODUCTION TO AB INITIO BASIS SETS. TERMS AND NOTATION is available http://www.ccl.net/cca/documents/basis-sets/basis.html and definition of normalization constant is well described in http://sourceforge.net/projects/libint/ ... rogman.pdf formulas (2)-(5). I'll try to help you, if you give me a description of the gwfn file format, especially the part of the basis set description.

Thank you for your interest in my questions, Vladimir.

P.S As external Gaussian basis set code I much prefer NWCHEM.

Post by **Mike Towler** » Fri Apr 17, 2015 6:40 am

Hi Vladimir,

OK - that's interesting. For some reason I thought that basis had f functions for C. I should have checked - quantum chemists have far too many acronyms!
I'll have another look at this today..

If the interface is well defined it usually does not cause problems.

One would think so, I agree. In practice..

I'll try to help you, if you give me a description of the gwfn file format, especially the part of the basis set description.

The gwfn.data format is defined in the CASINO manual, with more details of normalization etc being relegated to the examples/generic/gauss_dfg/README file. One of the things I'm trying to do this week is to unify all this information into the manual.

Mike

PS: once you really get into making sure the VMC-HF energy equals the HF energy of the external code, you will run into the cusp issue. As you know, the true wave function must have a gradient discontinuity, or cusp, at each nuclear position. As Gaussians have zero gradient at r=0, they therefore cannot - no matter how good the basis - represent the wave function correctly. This is a particular issue for a method like QMC which samples local properties since, e.g. the local energy H Psi / Psi will diverge if one of the electrons lands near a nucleus. If you have the keyword cusp_correction = T in the input file (this is the default) then CASINO will 'cut out' a small region of each orbital function near the nuclei, and replace it with an appropriate polynomial which both represents the cusp correctly and matches gradients etc. at the junction with the Gaussian-expanded function. The practical upshot of this is that you're not quite integrating the function you think you are, and although this is normally expected to have a relatively small effect on integrated properties - it may well become significant particularly if, as here, you're using a relatively poor basis set which doesn't even attempt to use high-exponent functions to simulate the cusp more accurately. Try turning off cusp_correction, and see what happens (the variance will likely increase significantly and you will have to run the simulation for a lot longer to get the error bars down).. Your energy difference appears too large for this to be the main issue, though. We'll see.

PPS: Email notifications should now be working again. Sorry about that.

Mdeible · Post by **Mdeible** » Sun Apr 19, 2015 12:02 am

Hi,

I have had issues with the Molden files produced by Orca in the past, like I said in the README file. Despite that, I think it's actually performing the way it should in this case. I ran the same calculation in Gaussian09B and Orca v3.0.1. I took the geometry from http://cccbdb.nist.gov/, because it was faster, but the bond lengths are similar. In both codes, I get a Hartree-Fock value of -307.1652992 au. (This is slightly different from your value of -307.165181, but it's probably due to slight deviations in the geometry. My CC= 1.5708, CH=1.0971). Regardless, both Orca and Gaussian agree on the HF energy, so I think the trial wave function is OK.

Using the Gaussiantoqmc converter included in the CASINO distribution (v2.13.376, though I don't think any changes have been made to the spd functions since then) to convert the Gaussian HF wave function to a gwfn.data file and using the input that you posted to run a VMC calculation, I get:

-307.255796938033 +/- 0.003060325207 No correction
-307.255796938033 +/- 0.005109472881 Correlation time method
-307.255796938037 +/- 0.004864767312 On-the-fly reblocking method

Sample variance of E_L (au^2/sim.cell) : 93.337777036916 +- 6.734796535722

So, I get basically the same thing that you get with Orca.

I also thought this may be due to the cusp, so I ran the calculation again with the general purpose cusp correction (use_gpcc : T) and with no cusp correction. The results are:

with gpcc:
-307.253347667940 +/- 0.002954830342 No correction
-307.253347667940 +/- 0.005280950172 Correlation time method
-307.253347667939 +/- 0.005275918994 On-the-fly reblocking method
Sample variance of E_L (au^2/sim.cell) : 86.856793959864 +- 0.362246593576

with no cusp correction:
-307.188082360576 +/- 0.004244264221 No correction
-307.188082360576 +/- 0.012790222238 Correlation time method
-307.188082360595 +/- 0.013355725829 On-the-fly reblocking method
Sample variance of E_L (au^2/sim.cell) : 1809.884394498210 +- 115.619159443598

So, it does look like it's probably just due to the cusp condition. That being said, I'm a bit surprised that the cusp corrected energy is so much lower than the HF energy. It's certainly not something I've seen before. Admittedly, I usually use a pseudopotential.

Mike D.

Vladimir_Konjkov · Post by **Vladimir_Konjkov** » Sun Apr 19, 2015 6:51 am

Thanks for the help , Mike.

Now all became clear. As I am doing QMC calculations only a few weeks, I wanted to make sure that the QMC programs work, as I guess.
I assumed that the cusp condition should not change the energy, it turned out that this is not true.

I want to add a few additional comments on the topic:
1. def2-SVP basis is really very small even for HF-calculation, since the energy it gives to C8H8 (-307.165181 au) is significantly differ from the HF-energy extrapolated to the complete basis set limit (-307.51 au).
Not surprisingly, that the cusp condition improve the result.
2. I had similar calculations (with def2-SVP basis) in the program QWALK and found no difference in energy between HF-energy from a third-party program (NWCHEM) and VMC energy with cusp condition in QWALK. The calculation in program QWALK had two significant differences: def2-SVP basis used in the Cartesian representation and cusp condition was implemented using splines http://www.qwalk.org/docs/docs/Cubic_splineDoc.html.
3. Using a more complete basis sets: DEF2-TZVP, DEF2-TZVPP, DEF2-QZVPP (in QWALK) by itself reduces the "sample variance of E" even without enabling cusp condition. Furthermore enabling of cusp condition for DEF2-QZVPP not give further improvements of "sample variance of E".

For myself, I made the following conclusions:
1. STO better initial approximation than the GTO, as explicitly take into account the cusp condition.
2. From available GTO basis sets, you need to choose one that best approximates cusp condition, it became clear that def2-SVP is not the best choice.

There was only one question related to this topic that I would like to get an answer:
Why QMC programs do not consider the symmetry of the molecule?
If there is no symmetry breaking of the wave function, as for C8H8, using the symmetry of wave function can significantly reduce amount of calculations for HF, MP2, CC.
As I understand, "symmetrically random walk" must also give smaller "Sample variance of E" and less energy for the same amount of walkers, or am I wrong?

Thanks in advance for your reply, Vladimir.

Post by **Mike Towler** » Sun Apr 19, 2015 9:05 pm

Hi Vladimir,

Try plotting some of the orbitals along vectors through the nuclei with cusp_correction set to T and F (using the qmc_plot block in input). That will give you a nice visual feel for the difference made by the correction.

2. I had similar calculations (with def2-SVP basis) in the program QWALK and found no difference in energy between HF-energy from a third-party program (NWCHEM) and VMC energy with cusp condition in QWALK. The calculation in program QWALK had two significant differences: def2-SVP basis used in the Cartesian representation and cusp condition was implemented using splines

That's a little surprising - you would think using (effectively) the same basis and the same cusp condition you would end up with similar energy differences with CASINO and QWALK. It would be interesting to understand more directly the origin of the difference.

That lazy layabout Wagner has probably screwed something up - these Americans - you just can't trust 'em (except Mike D. of course).

STO better initial approximation than the GTO, as explicitly take into account the cusp condition.

Well.. maybe. Slater basis functions certainly have a gradient discontinuity at the nucleus, but molecular orbitals coming out of a calculation with a code like ADF (the only one we support, if indeed there are any others) don't necessarily obey the Kato cusp condition. To do that you need one linear constraint per nucleus on the coefficients for each orbital, and the full set of these constraints can be expressed as a matrix equation, but this is not generally imposed in the Hartree-Fock calculation. You can enforce it on an ADF HF wave function using a linear projection, and this is what CASINO does, but there is no significant advantage in doing the cusp that way rather than the way it's done in Gaussian calculations. What is true is that you can express the molecular orbitals in fewer Slater functions than Gaussians, which gives a slight speed advantage in QMC.. but it does mean you have know how to use ADF, which -- to a first approximation -- nobody does. (Also - with anything heavy in QMC you probably want to use pseudopotentials - but ADF doesn't support them..)

From available GTO basis sets, you need to choose one that best approximates cusp condition, it became clear that def2-SVP is not the best choice.

With all-electron calculations it's good to use basis sets that have some narrow Gaussians with very high exponents - in general this reduces the 'cusp radius' inside which you have to correct the orbital.

There was only one question related to this topic that I would like to get an answer:
Why QMC programs do not consider the symmetry of the molecule?
If there is no symmetry breaking of the wave function, as for C8H8, using the symmetry of wave function can significantly reduce amount of calculations for HF, MP2, CC.

In general HF, MP2, CC etc. calculations deal with continuous functions (basis sets, orbitals, charge densities..) which can reflect the symmetry of the nuclear framework. The symmetry can mean, for example, that multiple bielectronic integrals are constrained to have the same value so you only need to compute one of them - and this gives a significant time saving over the case where symmetry is not used.

With QMC, there are no analytic integrations and the bulk of the time is taken in evaluating the many-body wave function (orbitals, Jastrow, etc..) and its derivatives at arbitrary positions in the configuration space i.e. you are dealing with instantaneous 'positions' of all the electrons ('configurations') rather than continuous distributions. Each 'snapshot' of the electrons on a random walk has - in general - no particular symmetry.

My daughter Saska - whose 10th birthday is mentioned on the front page of this site today because she has an account and typed in her date of birth years ago when I set the thing up - is sitting next to me now, and is insisting I use some of the smiley faces on the right of the editing page, so here goes:

Cheers,
Mike

The CASINO forum

Cubane (C8H8) VMC

Cubane (C8H8) VMC

Re: Cubane (C8H8) VMC

Re: Cubane (C8H8) VMC

Re: Cubane (C8H8) VMC

Re: Cubane (C8H8) VMC

Re: Cubane (C8H8) VMC

Re: Cubane (C8H8) VMC

Re: Cubane (C8H8) VMC