Tags: #AI
References:
What is Few-Shot Learning? - Unite.AI
Introduction
Few-shot learning refers to a variety of algorithms and techniques used to develop an AI model using a very small amount of training data.
- Cut down
- the amount of data needed to train a machine learning model
- the time needed to label large datasets.
- "one-shot"
"few-shot"
Methods
Most few-shot learning approaches can fit into one of three categories: data-level approaches, parameter-level approaches, and metrics-based approaches.
Data-level
Get more training data
- Similar training data:
If you are training a classifier to recognize specific kinds of dogs but lacked many images of the particular species you were trying to classify, you could include many images of dogs which would help the classifier determine the general features that make up a dog. - Data augmentation:
Apply transformation to existing data, e.g. rotating, GANs.
Parameter-level
Meta-learning
Meta-Learning: Learning to Learn Fast seems to be a good one.
Teach a model how to learn
One problem with few-shot training: overfit the training data ⇐ high-dimensional spaces.
To solve ⇒ limit the parameter space ⇒ regulation techniques & proper loss functions & a teacher
- Make use of two different models: a teacher model & a student model.
- Teacher ⇒ how to encapsulate the parameter space. (How to optimize)
- Student ⇒ how to recognize and classify the actual items in the dataset.
- The teacher model’s outputs are used to train the student model, showing the student model how to negotiate the large parameter space that results from too little training data. (Meta)
The process of a gradient-based training:
- Create the base-learner (teacher) model
- Train the base-learner model on the support set
- Have the base-learner return predictions for the query set
- Train the meta-learner (student) on the loss derived from the classification error
Starting with randomly initialized parameters ⇒ still potentially overfit the data.
"Model-agnostic" meta-learner is created by limiting the influence of the teacher model/base model. Instead of training the student model directly on the loss for the predictions made by the teacher model, the student model is trained on the loss for its own predictions.
The process of a model-agnostic training:
- A copy of the current meta-learner model is created.
- The copy is trained with the assistance of the base model/teacher model.
- The copy returns predictions for the training data.
- Computed loss is used to update the meta-learner.
Metric-based
- Use basic distance metrics, to classify query samples based on their similarity to the supporting samples.
- Prototypical network, cluster data points together combing clustering models with the metric-based classification described above. (Like K-means clustering.)