Hello QMC people.
As Goodhart's law says "When a measure becomes a target, it ceases to be a good measure".
You may think that this is just a adage, but in some situations it can lead to catastrophic consequences.
https://www.lesswrong.com/posts/fuSaKr6 ... tastrophic.
Let's say we want to optimize WFN to obtain the minimum DMC energy, as a measure of DMC energy, we choose VMC energy, which is easier to optimize and it seems that these two energies correlate with each other. At this point Goodhart's law comes into play.
How to determine whether Goodhart is catastrophic in our situation?
As stated in the article Journal of Statistical Physics, VoL 43, Nos. 5/6, 1986 "The Statistical Error of Green's Function Monte Carlo" D. M. Ceperley
optimization of VMC energy leads to a decrease of DMC variance, this may not be entirely true, but gives a hint that the VMC energy as measure is bad.
One possible check would be to look at the joint distribution of DMC energy and VMC energy as functions of the random WFN parameters sets
distributed around minimal VMC energy one. The higher the correlation coefficient, the better Goodhard.
Best Vladimir.
Goodhart's law and VMC optimization.
-
- Posts: 171
- Joined: Wed Apr 15, 2015 3:14 pm
Goodhart's law and VMC optimization.
In Soviet Russia Casino plays you.
-
- Posts: 117
- Joined: Fri May 31, 2013 10:42 am
- Location: Lancaster
- Contact:
Re: Goodhart's law and VMC optimization.
Dear Vladimir,
Is your point that, for example, VMC variance isn't a perfect measure for comparing two different types of wave function when they are optimised by minimising the variance? Or is your point that we optimise wave functions by some criterion that stands in for the one we are ultimately interested in, which is to maximise the accuracy of our final result? Both seem to be related to the QMC version of Goodhart's law, although they are not exactly the same issue.
To some extent the former point can be addressed by keeping an eye on VMC energy when variance is optimised, and vice versa.
You're surely right about the latter point, but what can one do about it?
In a few cases, parameters have been optimised by minimising the DMC energy; e.g., I have optimised Gaussian exponents for Wigner crystal orbitals using this approach, where it gives results that are clearly different from VMC energy minimisation.
Looking at the correlation of VMC energy and DMC energy is interesting, but will of course depend on the wave function form and the parameters that you vary (Jastrow parameters being an extreme case that do not affect the DMC energy, although they affect the accuracy of other expectation values and the noise and bias in the DMC results).
Best wishes,
Neil.
Is your point that, for example, VMC variance isn't a perfect measure for comparing two different types of wave function when they are optimised by minimising the variance? Or is your point that we optimise wave functions by some criterion that stands in for the one we are ultimately interested in, which is to maximise the accuracy of our final result? Both seem to be related to the QMC version of Goodhart's law, although they are not exactly the same issue.
To some extent the former point can be addressed by keeping an eye on VMC energy when variance is optimised, and vice versa.
You're surely right about the latter point, but what can one do about it?
In a few cases, parameters have been optimised by minimising the DMC energy; e.g., I have optimised Gaussian exponents for Wigner crystal orbitals using this approach, where it gives results that are clearly different from VMC energy minimisation.
Looking at the correlation of VMC energy and DMC energy is interesting, but will of course depend on the wave function form and the parameters that you vary (Jastrow parameters being an extreme case that do not affect the DMC energy, although they affect the accuracy of other expectation values and the noise and bias in the DMC results).
Best wishes,
Neil.
-
- Posts: 171
- Joined: Wed Apr 15, 2015 3:14 pm
Re: Goodhart's law and VMC optimization.
Hello Neil.
I meant that it is very difficult to optimize DMC energy for single determinant Slater-Jastrow-Backflow (SD-SJB) WFN, however it is easy to optimize VMC energy instead. I'll plan to look at correlation of VMC energy and DMC energy for next year, but for now I'm just thinking about it, because I need to write a small program code that automates those calculations. The naive model is that U is the VMC energy and V is the DMC energy. The large ellipse limits the area that we can achieve by varying the Backflow parameters. After VMC energy optimization, we are at the point of intersection of the blue line with the ellipse. At this point the tangent to the ellipse is vertical, because any change in parameters reduces VMC energy. The green arrow shows optimization of parameters "perpendicular" to VMC energy optimization which increasing DMC energy. Mike Towler wrote somewhere that Backflow optimization consists of two parts, this figure clearly illustrates why this can be the case. An interesting point is how to scale random Backflow parameter variation, since the parameters have different dimensions in units of length and must have different scales. Considering that the S-matrix in VMC energy optimization (except for the 0th row and column) is a covariance matrix, by rescaling Backflow parameters it can be converted to Pearson correlation coefficient matrix where all the diagonal elements are equal to 1 and each off-diagonal element is between −1 and +1 inclusive. I think this would be ideal rescaling, but I don’t know where to get the S-matrix for DMC, otherwise one could use linear method in DMC energy optimization and I think one linear step would be enough.
On the other hand, I think VMC varmin is initial, very fast and robust step towards VMC emin, as VMC emin should be for DMC emin.
Therefore, all reasoning can be similar.
Best Vladimir.
I meant that it is very difficult to optimize DMC energy for single determinant Slater-Jastrow-Backflow (SD-SJB) WFN, however it is easy to optimize VMC energy instead. I'll plan to look at correlation of VMC energy and DMC energy for next year, but for now I'm just thinking about it, because I need to write a small program code that automates those calculations. The naive model is that U is the VMC energy and V is the DMC energy. The large ellipse limits the area that we can achieve by varying the Backflow parameters. After VMC energy optimization, we are at the point of intersection of the blue line with the ellipse. At this point the tangent to the ellipse is vertical, because any change in parameters reduces VMC energy. The green arrow shows optimization of parameters "perpendicular" to VMC energy optimization which increasing DMC energy. Mike Towler wrote somewhere that Backflow optimization consists of two parts, this figure clearly illustrates why this can be the case. An interesting point is how to scale random Backflow parameter variation, since the parameters have different dimensions in units of length and must have different scales. Considering that the S-matrix in VMC energy optimization (except for the 0th row and column) is a covariance matrix, by rescaling Backflow parameters it can be converted to Pearson correlation coefficient matrix where all the diagonal elements are equal to 1 and each off-diagonal element is between −1 and +1 inclusive. I think this would be ideal rescaling, but I don’t know where to get the S-matrix for DMC, otherwise one could use linear method in DMC energy optimization and I think one linear step would be enough.
On the other hand, I think VMC varmin is initial, very fast and robust step towards VMC emin, as VMC emin should be for DMC emin.
Therefore, all reasoning can be similar.
Best Vladimir.
In Soviet Russia Casino plays you.