Few-shot learning

Tags: #AI

References:
What is Few-Shot Learning? - Unite.AI

Introduction

Few-shot learning refers to a variety of algorithms and techniques used to develop an AI model using a very small amount of training data.

Cut down
- the amount of data needed to train a machine learning model
- the time needed to label large datasets.
"one-shot" $\subseteq$ "few-shot"

Methods

Most few-shot learning approaches can fit into one of three categories: data-level approaches, parameter-level approaches, and metrics-based approaches.

Data-level

Get more training data

Similar training data:
If you are training a classifier to recognize specific kinds of dogs but lacked many images of the particular species you were trying to classify, you could include many images of dogs which would help the classifier determine the general features that make up a dog.
Data augmentation:
Apply transformation to existing data, e.g. rotating, GANs.

Parameter-level

Meta-learning

Meta-Learning: Learning to Learn Fast seems to be a good one.

Teach a model how to learn
One problem with few-shot training: overfit the training data ⇐ high-dimensional spaces.
To solve ⇒ limit the parameter space ⇒ regulation techniques & proper loss functions & a teacher

Make use of two different models: a teacher model & a student model.
- Teacher ⇒ how to encapsulate the parameter space. (How to optimize)
- Student ⇒ how to recognize and classify the actual items in the dataset.
The teacher model’s outputs are used to train the student model, showing the student model how to negotiate the large parameter space that results from too little training data. (Meta)

The process of a gradient-based training:

Create the base-learner (teacher) model
Train the base-learner model on the support set
Have the base-learner return predictions for the query set
Train the meta-learner (student) on the loss derived from the classification error

Starting with randomly initialized parameters ⇒ still potentially overfit the data.
"Model-agnostic" meta-learner is created by limiting the influence of the teacher model/base model. Instead of training the student model directly on the loss for the predictions made by the teacher model, the student model is trained on the loss for its own predictions.
The process of a model-agnostic training:

A copy of the current meta-learner model is created.
The copy is trained with the assistance of the base model/teacher model.
The copy returns predictions for the training data.
Computed loss is used to update the meta-learner.

Metric-based