By using autoregressive Bellman updates, conservative regularization, Monte Carlo and n-step returns, we are able to combine human demonstrations and autonomously collected data to learn multi-task language-conditioned policies from both, successful and failed examples.