Towards fast weak adversarial training to solve high 12 dimensional parabolic partial di(cid:11)erential equations 13 using XNODE-WAN

16 Due to the curse of dimensionality, solving high dimensional parabolic partial 17 di(cid:11)erential equations (PDEs) has been a challenging problem for decades. 18 Recently, a weak adversarial network (WAN) proposed in (Y.Zang et al., 19 2020) o(cid:11)ered a (cid:13)exible and computationally e(cid:14)cient approach to tackle this 20 problem de(cid:12)ned on arbitrary domains by leveraging the weak solution. WAN 21 reformulates the PDE problem as a generative adversarial network, where the 22 weak solution (primal network) and the test function (adversarial network) 23 are parameterized by the multi-layer deep neural networks (DNNs). However, 24 it is not yet clear whether DNNs are the most e(cid:11)ective model for the parabolic 25 PDE solutions as they do not take into account the fundamentally di(cid:11)erent 26 roles played by time and spatial variables in the solution. To reinforce the 27 di(cid:11)erence, we design a novel so-called XNODE model for the primal network, and e(cid:14)ciency of training. Numerical results show that our method can reduce 34 the training time to a fraction of that of the WAN model. 35

empirical results [1,5,6,7,8], has been a growing area of research. Relevant 48 numerical analysis that takes into account of approximation rates, generality 49 and optimization theory can be also found in many works, see [9,10,2,11].
on Ω(0).  class of the spatio-temporal domains, whose spatial domain can be irregular 129 1 The uniformly parabolic property means there exists α > 0 such that define the following bilinear form: and the linear operator: For the ease of notation, we define H 1 0 (D), L 2 (D) and L 2 (∂D) by where ∇f (t, ·) is the weak derivative of f (t, ·).

160
From integration by parts, it is easy to check that the weak solution u to

161
(1) satisfies the following variational problem: Therefore, it leads to a new interpretation of u as a solution to the below 163 minimization problem: where v L 2 (D) is L 2 norm of v([17]). Note that for a given w, the optimal 165 test function v is the one to attain the supremum In (3)  proposed XNODE model, which will be discussed in in Section 3.

177
The loss function of [17], which is also used in our method, consists of 178 the interior loss, the boundary loss, and the initial loss as follows: where α, γ are hyperparameters as penalty terms and Note that the interior loss L int is based on the weak formulation of the so-  Carlo estimators of L int (θ, η), L bdry (θ) and L init (θ) based on the sampling 192 points respectively. Then we define the empirical loss function by additional comments will be modified in our algorithm in Section 3. A list 201 of notation explanation can be found in Table B.2.

202
Algorithm 1 WAN Algorithm Input: domain D, tolerance > 0, N D /N ∂D /N 0 : number of sampled points on the domain/boundary/initial condition, K u /K φ : number of solution/adversarial network parameter updates per iteration, α/γ ∈ R + : the weight of boundary/initial loss, and the hyperparameters for all DNNs involved 1: Initialise: the DNN network architectures u θ , φ η : D → R 2: generate points sets X D , X ∂D and X 0 random point sampling step 3: whileL(θ, η) > do compute ∇ θL (θ, η); gradient calculation step 7: Here, the vector field N vec  bounded. When given a fixed spatial variable x ∈ Ω, the mapũ : t → u(t, x) 251 can be viewed as a solution to the following ODE problem from PDE (1): where F is a function depending on (ũ, t; x) defined as Note that although F is not observable due to the unknown partial derivative where N vec θ 2 and N init θ 1 are neural networks fully parameterized by θ 1 and θ 2 267 for the vector fields and initial condition of the hidden neural h respectively, 268 and Lθ is a linear trainable layer with the weightsθ. We denote by Θ =  To effectively use the initial condition of the PDE u(0, x) = h(x), the 283 hidden neuron at the initial time h(0) is given by a neural network N init compromising any discriminate capacity of the model. As long as Lθ • N init 286 can approximate the identity map, which is guaranteed by the universality of 287 the neural network, it is able to achieve the satisfactory fitting performance

325
The full XNODE-WAN algorithm is outlined in Algorithm 5 in Appendix. • in Fig. 1b).

335
To circumvent the problem, we shall divide each constant path X ≡ x where inf ∅ = +∞ by convention. Let N x denote the total number of finite  Fig. 1b).
where f , g and h may vary from example to example. In Section 4.2, we     Table B where Ω = [0, 1] 5 . The exact solution is given by The stopping criteria for both algorithms is set to achieve a relative er-468 ror tolerance level = 10 −3 . We also randomly test 50 trials on randomly 469 selected data to measure the relative L 2 error on the obtained models. We  Figure 3a gives the evolution of relative errors over time for both models.

478
Clearly the proposed model converges much faster than that of WAN.

492
We again implement both models and the results is shown in Figure 4 493 with the same stopping criteria, i.e., the relative L 2 error = 10 −3 . The It is easy to verify that the solution to (11) is given by is visualized in Figure 5. Figure 5a shows that even for a high dimensional Furthermore, we consider different values of learning rate for both our 536 method and WAN, and the corresponding performance. Fig. 6b shows that, 537 independently of the depth of the hidden field, a learning rate on the magni-538 tude of 10 −5 is needed to achieve a reasonable loss for WAN while XNODE-539 WAN allows for a much wider range of learning rate, i.e., from 10 −5 to 10 −2 .

540
In practice, we conduct the grid search to select the optimal learning rate 541 for each method, i.e. to achieve the satisfactory accuracy, the learning rate   Figure 7a for its shape), Note that to illustrate the idea in this example we only consider d = 1.
where N vec θ 1 and N init θ 2 are neural networks fully parameterized by θ 1 and θ 2 596 for the vector fields and initial condition of the hidden neural h respectively, 597 and Lθ is a linear trainable layer with the weightsθ. 598 We will establish the universal approximation theorem of the proposed  Thanks to the universality of neural network and the above theorem, we 606 now establish the universality of the XNODE model in the following theorem. Proof:. In this proof, we constrain the XNODE models to the ones which has 614 the hidden dimension 1 and identity map being their output layer. Therefore Number of sampled collocation points of the spatial domain boundary n T Number of sampled time partition K u Inner iteration to update weak solution u θ or u Θ K φ Inner iteration to update test function φ η α Weight parameter of boundary lossL bdry γ Weight parameter of initial lossL init error tolerance τ θ or τ Θ Learning rate for the primal network τ η Learning rate for network parameter η of test function φ η .