Constrained derivatives.

Vladimir_Konjkov · Post by **Vladimir_Konjkov** » Tue Mar 07, 2023 8:34 am

Hellow QMC folks.
As described in "Optimization of quantum Monte Carlo wave functions by energy minimization" of Julien Toulouse.

"The most straightforward way to energy-optimize linear parameters in wave functions is to diagonalize the Hamiltonian
in the variational space that they define, leading to a generalized eigenvalue equation." Energy calculated with wave function depended on parameters p is:
E(p) = <ψ(p)|Ĥ|ψ(p)>/<ψ(p)|ψ(p)>
which is Rayleigh quotient. To determine the stationary points of E(p) or solving ∇E(p) = 0 we have to solve following generalized eigenvalue problem, with ψ(p) expand to first-order in the parameters p:
H · Δp = E(p) * S · Δp
where elements of the matrices S and H approach the standard quantum mechanical overlap integrals and Hamiltonian matrix elements in
the limit of an infinite Monte Carlo sample or exact ψ(p), hence their names. Thus, the extremum points of ψ(p*) (extremum values E(p*))
of the Rayleigh quotient are obtained as the eigenvectors e (eigenvalues λ(e)) of the corresponding generalized eigenproblem. If the second-order expansion of ψ(p) is not small, this does not ensure the convergence in one step and may require uniformly rescaling of ∆p to stabilise iterative process.

I would like to clarify what to do if the parameters are constrained, e.g. if p is the full set of parameters constrained by a @ p = b (numpy notation) Then we can compute unconstrained derivatives df(p)/dp and find corresponding derivatives of f subject to the constraint by projection, as described in "Constrained Differentiation" G. SCHAY. with projector p = I - a.T @ (a @ a.T)**-1 @ a and projected_derivatives is df(p)/dp @ p. But projected_derivatives is a vector with a dimension equal to the dimension of p, and we need a derivatives vector with a dimension equal to the number of independent parameters let's call it independent_derivatives. What should we do? One can see that independent_derivatives should be projected with subset of p to subset of df(p)/dp @ p:
independent_derivatives @ p_subset = (df(p)/dp @ p)_subset
which gives us the answer (since the matrix p_subset is not singular):
independent_derivatives = (df(p)/dp @ p)_subset @ p_subset**-1
I would like to clarify whether this method or a similar one is used in CASINO or others codes.

Best Vladimir.

Post by **Neil Drummond** » Tue Mar 07, 2023 10:03 am

In CASINO we have (so far) dealt with constraints by using them to eliminate parameters, leaving us with an independent set of parameters. For simple things like the u(r) term in the Jastrow factor it is easy enough to express the coefficient of the first-order term in terms of the zeroth-order term to satisfy the Kato cusp conditions. For things such as the f term we write out linear constraint equations and use Gauss-Jordan elimination to express parameters corresponding to pivot columns in terms of the remaining free parameters.

Best wishes,

Neil.

Vladimir_Konjkov · Post by **Vladimir_Konjkov** » Wed Mar 08, 2023 4:55 am

Hello Neil.

I wanted to talk about the partial derivatives of the wave function and local energy with respect to the Jastrow and Backflow parameters, that are needed for optimization. These partial derivatives could be easily obtained analiticaly if the parameters were not linearly dependent. I noticed that such partial derivatives are calculated numerically in the Сasino. At least in "old style" Jastrow and backflow. Although they could be calculated analytically and then projected onto a subspace of independent parameters. For each vector p this is a linear transformation depending on p, more precisely from nonlinear parameters.

Best Vladimir.

Post by **Neil Drummond** » Wed Mar 08, 2023 9:53 am

For the "old" Jastrow factor in pjastrow.f90, the subroutine "get_linear_basis" returns analytical derivatives w.r.t. the independent subset of linear parameters. At present this is only used in the "varmin-linjas" optimisation method, however.

Numerical differentiation w.r.t. Jastrow parameters shouldn't be too problematic, because the dependence on those parameters is very simple: the Jastrow is linear in everything apart from cutoff lengths, so the local energy is a quadratic function of the parameters. Numerical differentiation may not be the most efficient approach, but I would have thought it would be fairly safe.

Analytical derivatives might be more important for cutoff lengths and for backflow parameters.

Best wishes,

Neil.

Vladimir_Konjkov · Post by **Vladimir_Konjkov** » Wed Mar 08, 2023 1:44 pm

We need the linear dependence of the parameters to be preserved in the vicinity of the point at which we calculate the partial derivatives w.r.t parameters.
Therefore we need jacobian matrix like this.

: Screenshot_20230308_201830.png (29.88 KiB) Viewed 85785 times

Constrained partial derivatives is nullspace of ∇g which may be function of parameters (and only) or not. It's easy to project a vector to nullspace.

Post by **Neil Drummond** » Thu Mar 09, 2023 11:43 am

An alternative approach for dealing with the homogeneous linear constraints would be to use SVD to find the basis spanning the nullity (the solution space) and then the parameters in correlation.data would be the coefficients of those basis vectors, again giving an independent set of parameters.

It's possible that this might be better from a numerical point of view (although even then I am not sure, because the pivoted Gauss-Jordan elimination approach in CASINO should be robust). It would make the parameters in correlation.data more abstract and hence make it less easy for the user to know what each parameter actually is.

Best wishes,

Neil.

Vladimir_Konjkov · Post by **Vladimir_Konjkov** » Thu Mar 09, 2023 4:30 pm

I tried to explain above that my idea is different but independent set of parameters is the same as yours. As we calculate the gradient in the space of
all parameters, we know how to project gradient to nullspace of ∇g (from above), let me remind you that g(p) = c - parameters constraints.
At the point p0, then g(p0 + dp) = ∇g * dp + o(dp). If dp is nullspace of ∇g, which is the tangent space of constraint surface at p0, then g(p0 + dp) = o(dp) and ∇g * dp = 0, that is constraint is satisfied at p0 + dp. But if dp is not in the nullspace of ∇g then we can project it there. Moreover corresponding differential of some function F(p) subject to constrain is ∇F(p) * M(p) * dp where ∇F(p) is unconstrained gradient (easy to calculate) and M(p) is projector to nullspace of ∇g(p) i.e. annihilator matrix. Next you need to go to an independent set of parameters in ∇F(p) * M(p), for this you need to solve following equation for ∇F_ind, this is the most obscure part:

∇F * M * d[ p_ind | 0 ] = [ ∇F_ind | ∇F_dep ] * M * d[ p_ind | 0 ]

UPD: I'm writing python code that illustrates this approach, but it takes a couple of weeks. Jastrow varmin optimization is already running on this approach with a speed comparable to the speed of Fortran in CASINO.

Best Vladimir.

Vladimir_Konjkov · Post by **Vladimir_Konjkov** » Sat Mar 11, 2023 5:05 am

The main problem in calculating the analytical energy gradient w.r.t backflow parameters is that in equation of Ti we have laplacian of backflowed slater determinant as

: Screenshot_20230311_115913.png (13.67 KiB) Viewed 85726 times

additional differentiation this laplacian w.r.t. backflow parameters requires the calculation of 3-rd partial derivatives of the slater determinant w.r.t to electronic coordinates.

I think in this part I will leave the numerical differentiation.

Post by **Neil Drummond** » Thu Mar 16, 2023 11:36 am

Sorry it has taken me ages to reply.

Analytical derivatives w.r.t. backflow parameters will certainly get messy.

Obviously I agree that you can evaluate linearly constrained derivatives for parameters by evaluating unconstrained derivatives and using linear algebra. This could be done using the projection operations that you suggest, or could be done using the Gauss-Jordan methods already in CASINO to express the full set of parameters in terms of the independent parameters (with the dependent parameters being expressed as a matrix multiplied into the independent parameters). One can then use the chain rule to express the gradient w.r.t. the independent parameters in terms of the gradient w.r.t. all the parameters.

There's nothing stopping the use of analytic derivatives for pjastrow.f90; I just haven't got round to it!

Best wishes,

Neil.

Vladimir_Konjkov · Post by **Vladimir_Konjkov** » Fri Mar 17, 2023 7:03 am

I have already written all the code, now testing and any cutoffs are still outside of my algorithm. Not sure if they need to be optimized at all.
Unfortunately third partial derivatives of the wave function w.r.t the electron coordinates are necessary.
I hardly read the CASINO code except for two procedures construct_C and construct_A (for now has become construct_A_sym and construct_A_asym),
which were needed for the correct interpretation of the CASINO input files. Your scientific articles on the contrary were very useful, for example:
PHYSICAL REVIEW B 72, 085124 2005, "Variance-minimization scheme for optimizing Jastrow factors", N. D. Drummond and R. J. Needs.
but this method seems too complicated to me.

Best wishes,

Vladimir.

The CASINO forum

Constrained derivatives.

Constrained derivatives.

Re: Constrained derivatives.

Re: Constrained derivatives.

Re: Constrained derivatives.

Re: Constrained derivatives.

Re: Constrained derivatives.

Re: Constrained derivatives.

Re: Constrained derivatives.

Re: Constrained derivatives.

Re: Constrained derivatives.