Unlocking New Potential in Continual Learning with the Infinite dSprites Framework

cover
27 Aug 2024

Authors:

(1) Sebastian Dziadzio, University of Tübingen (sebastian.dziadzio@uni-tuebingen.de);

(2) Çagatay Yıldız, University of Tübingen;

(3) Gido M. van de Ven, KU Leuven;

(4) Tomasz Trzcinski, IDEAS NCBR, Warsaw University of Technology, Tooploox;

(5) Tinne Tuytelaars, KU Leuven;

(6) Matthias Bethge, University of Tübingen.

Abstract and 1. Introduction

2. Two problems with the current approach to class-incremental continual learning

3. Methods and 3.1. Infinite dSprites

3.2. Disentangled learning

4. Related work

4.1. Continual learning and 4.2. Benchmarking continual learning

5. Experiments

5.1. Regularization methods and 5.2. Replay-based methods

5.3. Do we need equivariance?

5.4. One-shot generalization and 5.5. Open-set classification

5.6. Online vs. offline

Conclusion, Acknowledgments and References

Supplementary Material

3. Methods

In this section, we describe two important contributions of this work: a software package for generating arbitrarily long continual learning benchmarks and a conceptual disentangled learning framework accompanied by an example implementation. We would like to emphasize that this work aims to provide a new perspective on knowledge transfer in continual learning, and to propose new benchmarks for evaluating continual learning methods. Our implementation serves as a proof of concept, spotlighting the potential of equivariance learning, and is not intended as a practical method for general use.

3.1. Infinite dSprites

We introduce idSprites, a novel framework inspired by dSprites [23], designed for easy creation of arbitrarily long continual learning benchmarks. A single idSprites benchmark consists of T tasks, where each task is an n-fold classification of procedurally generated shapes. Similar to dSprites, each shape is observed in all possible combinations of the following FoVs: color, scale, orientation, horizontal position, and vertical position. Figure 2 shows an example batch of images with four FoVs and two values per factor (in general, our implementation allows for arbitrary granularity). The canonical form corresponds to a scale of 1, orientation of 0, and horizontal and vertical positions of 0.5. We only use a single color in our experiments for simplicity and to save computation.

The shapes are generated by first randomly sampling the number of vertices from a discrete uniform distribution over a closed integer interval Ja, bK, then constructing a regular polygon on a unit circle, randomly perturbing the polar coordinates of each vertex, and finally connecting the perturbed vertices with a closed spline of the order randomly chosen from {1, 3}. All shapes are then scaled and centered so that their bounding boxes are the same size and their centers of mass align in the canonical form. We also make orientation identifiable by painting one half of the shape black.

The number of tasks T, the number of shapes per task n, the vertex number interval [a, b], the exact FoV ranges, and the parameters of noise distributions for radial and angular coordinates are set by the user, providing the flexibility to control the length and difficulty of the benchmark. The framework also provides access to the ground truth values of the individual FoVs. We will release idSprites as a Python package and hope it will unlock new research directions in continual classification, transfer learning, and continual disentanglement.

This paper is available on arxiv under CC 4.0 license.