Radiative transfer, described by the radiative transfer equation (RTE), is one of the dominant energy
exchange processes in the inertial confinement fusion (ICF) experiments. The Marshak wave problem is an
important benchmark for time-dependent RTE. In this work, we present a neural network architecture termed
RNN-attention deep learning (RADL) as a surrogate model to solve the inverse boundary problem of the
nonlinear Marshak wave in a data-driven fashion. We train the surrogate model by numerical simulation
data of the forward problem, and then solve the inverse problem by minimizing the distance between the
target solution and the surrogate predicted solution concerning the boundary condition. This minimization
is made efficient because the surrogate model by-passes the expensive numerical solution, and the model is
differentiable so the gradient-based optimization algorithms are adopted. The effectiveness of our approach
is demonstrated by solving the inverse boundary problems of the Marshak wave benchmark in two case
studies: where the transport process is modeled by RTE and where it is modeled by its nonlinear diffusion
approximation (DA). Last but not least, the importance of using both the RNN and the factor-attention blocks
in the RADL model is illustrated, and the data efficiency of our model is investigated in this work.