Semi-Supervised Safe Visuomotor Policy Synthesis using Barrier Certificates

Manan Tayal, Aditya Singh, Pushpak Jagtap, Shishir Kolathaya

Cyber Physical Systems, Indian Institute of Science, Bangalore

Abstract

With the rapid integration of autonomous systems into various domains, ensuring their safety has become increasingly critical. A significant focus in modern robotics is the utilization of visuomotor feedback for control in complex environments. Supervised learning techniques, such as imitation learning, rely on ground truth safety labels for the entire dataset and do not provide any formal safety guarantees. On the other hand, traditional control methods, including Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) reachability, offer formal safety guarantees but depend on precise knowledge of system dynamics, which is often unavailable when dealing with high-dimensional observations like visuomotor data. To address these limitations, we propose a novel approach to synthesize a semi-supervised safe visuomotor policy using barrier certificates that combines the strengths of model-free supervised learning and model-based control methods. This framework synthesizes a provably safe controller without requiring safety labels for the entire dataset and ensures completeness guarantees for both the barrier certificate and the policy. We validate our approach through distinct case studies: an inverted pendulum system and the obstacle avoidance of an autonomous mobile robot.

Safety-driven Latent Dynamics (SaLaD) Model

We present the architecture of of the Safety-driven Latent Dynamics (SaLaD) model. The visuomotor observation $\mathcal{O}_i$ is encoded by $E_{\theta}$ into a latent representation $z_i$. Then, SaLaD predicts the next latent states $z_{i+1}$ as well as the Barrier Certificate over the safe and unsafe datasets for each latent state, and we optimize SaLaD using Equation 7 in the paper. Next visual observation $\mathcal{O}_{i+1}$ is encoded using target net $E_{\theta^-}$ ($\theta^-$: slow-moving average of $\theta$) and used as latent targets only during training.

Theorems and Proofs

Theorem 1: Consider a discrete-time robotic system and initial safe and unsafe sets $X_{s}, X_{u} \subseteq X$, respectively. Given $\bar{\epsilon}$, suppose we sample $N$ data points: $x_{i} \in X, i \in\{1, \ldots, N\}$, such that $\left\|x-x_{i}\right\| \leq \bar{\epsilon}$. Let $B_{\theta}$ be the neural network-based CBC with trainable parameters $\theta$. Then $B_{\theta}$ is a valid CBC over the entire state space $X$, if the following condition holds: \begin{equation} \psi^{*} \geq L_{B} \bar{\epsilon}, \end{equation} where $L_{B}$ is the Lipschitz constant of $B_{\theta}$.

Proof: For any $x$ and any $k\in\{1,2,3\}$, we know that: \begin{equation*} \begin{aligned} q_k(x) & = q_k(x) - q_k(x_i) + q_k(x_i)\\ & \leq L_{B}\left\|x - x_i \right\| - \psi^*\\ & \leq L_{B}\bar{\epsilon} - \psi^* \leq 0. \end{aligned} \end{equation*} Hence, if $q_{k}(x), k \in\{1,2,3\}$ satisfies above condition, then the $B_{\theta}$ is a valid CBC, satisfying conditions (4).

□

Theorem 2: Consider a discrete time robotic system and the (latent) dynamics learned using SaLaD model. Let $\delta$ represent the maximum latent state consistency error, then the condition (4) is valid, if the following condition holds: \begin{equation} \eta^{*} \geq L_{B} \delta, \end{equation} where $L_{B}$ is the Lipschitz constant of $B_{\theta}$ and \begin{equation} \delta = \max_{i \in \{1, \dots, N \} } \|d_{\theta}(z_{i}, u) - E_{\theta}(\mathcal{O}_{i+1}) \|_{2}. \end{equation}

Proof:\begin{equation*} \begin{aligned} z_{i+1} &= E_{\theta}(\mathcal{O}_{i+1}) \\ B(z_{i+1}) - B (d_{\theta}(z_{i}, u)) & \leq | B(z_{i+1}) - B (d_{\theta}(z_{i}, u)) | \\ & \leq L_B \|z_{i+1} - d_{\theta}(z_{i}, u)\|_2\\ & \leq L_B \delta. \end{aligned} \end{equation*} Now, if we have a $\eta^*$ such that $L_{B} \delta \leq \eta^{*}$, then, we have, \begin{equation*} \begin{aligned} B(z_{i+1}) - B (d_{\theta}(z_{i}, u)) & \leq \eta^* \\ B(z_{i+1}) &\leq B (d_{\theta}(z_{i}, u)) + \eta^* \\ B(z_{i+1}) - B(z_i) &\leq B (d_{\theta}(z_{i}, u)) + \eta^* - B(z_i)\\ \max(0, B(z_{i+1}) - B(z_i)) &\leq \\ \max(0, B (d_{\theta}(&z_{i}, u)) + \eta^* - B(z_i)) \end{aligned} \end{equation*} Therefore, choosing an $\eta \geq \eta^*$ will ensure the satisfaction of condition (4), despite the error in learned (latent) dynamics.

□

Comparisons

The above Table presents a comparison of various visuomotor control techniques evaluated across several parameters. These parameters include the requirement for a system model, the ability to handle general dynamics, the necessity for expert demonstrations, computational complexity, the provision of safety guarantees, and whether these guarantees are valid across the entire state space (completeness guarantees).

Case Studies

This demonstrates the efficacy of our proposed framework.