HomeTechnologyWireless and NetworkingUsing multiple labels improves neural network learning

Using multiple labels improves neural network learning

March 4, 2021

A common problem in Machine Learning (ML) is choosing a suitable loss function for the task at hand. Often, a task falls into one of two categories: classification or regression. However, there are some problems that do not fit well into either category, which can be illustrated using the following example.

Consider the problem of historical image ranking, where the goal is to accurately predict the decade in which the image was taken. Here, the labels are given in discrete classes, for example, “the 1950s” or “the 1960s”, so it might look like a classification problem at first. Nevertheless, there is also an ordinal structure of the labels. We can place the decades on a timeline, and therefore this problem fits into the category of ordinal regression.

In our paper “Deep Ordinal Regression with Label Diversity”, we show that it is possible to tackle these kinds of problems using a standard classification loss function, while still exploiting the ordinal information of the labels. We do this by introducing the concept of label diversity.

A single label is not enough

Label diversity can be introduced by creating several labels for each training example in a way that the ordinal structure allows. In the problem of estimating age from a single image, where the task is to predict the age of a person, the ground truth values are often given as years in an integer format. So, in one sense, the classes are discrete. A common method used for age estimation in ML is regression with classification (RvC). This is where each year is treated as a separate class, and then a standard cross-entropy loss is used to train the neural network to classify each person. This approach is illustrated in figure 1, where the real number line has been discretized into non-overlapping classes, representing the different age categories.

Figure 1: A baseline RvC approach to discretization

The drawback of RvC is that the loss is agnostic to the magnitude of the error. An error of 10 years yield the same loss as an error of 1 year, because the ordinal information in the labels is discarded. Intuitively, it would be more advantageous to incorporate this information into the learning process, to get better future predictions.

Generating diversity

In our paper, we show that if we create several sets of overlapping classes, we can achieve the desired property of the loss function. For example, a person that is 40 years old can be labeled in several brackets, such as. 38-42, 39-43, and so on. This representation is in a sense diverse, as each training example is assigned multiple labels. The width of the brackets and the total number of labels for each example are hyperparameters that can be tuned to a specific problem. Figure 2 illustrates the general approach, where the real number line has been discretized in three different ways. During training time, the neural network must learn to correctly classify each example using all labels simultaneously. Therefore, the loss is smaller for a prediction that is closer to the true age of the person.

Figure 2: Label diversity can be induced by using multiple discretization’s.

By using label diversity, we can achieve a lower prediction error compared to using the standard RvC method. The results on image ranking are shown in Figure 3, and we see that label diversity yields both higher accuracy and a lower mean average error compared to both direct regression and RvC. In our full paper, we present results on two additional tasks: age estimation and head pose estimation.

Figure 3: Accuracy and mean average error of the different methods on the Historical Color Images Dataset

How to implement it

Our approach can be used as a simple extension to any neural network by using multiple prediction heads. As illustrated in figure 4, each head is classifying the image into its own set of brackets. Although this method is closely related to ensemble learning, we can train all prediction heads simultaneously as they share backbone feature extractor. Therefore, the computational overhead is minimal at both training and inference time, which is important for mobile applications.

Figure 4: A feature extractor is trained using multiple prediction heads. At inference time, the different outputs are combined into a single prediction.

Label diversity could also be used for a wide range of other problems where the ordinal label structure is important. For example, there are many medical applications, such as cancer severity grading, where the goal is to make a risk assessment based on screening data. In such applications, the accuracy of the model is critical. However, if you want to run inference on a mobile device, then latency might be more important. Our method allows the used to trade off accuracy and computational resources to tailor it towards the specific use case by varying the degree of diversity.

Ralated Articles

Using multiple labels improves neural network learning

Infineon AIROC CYW20829 to support Engineered for Intel Evo Laptop Accessories Program

Trump Plans to Impose 100% Tariff on Computer Chips, Likely Driving Up Electronics Prices

Quectel introduces KCM0A5S Wi-SUN module for smart city and smart utility devices

Trump’s Trade Bombshell: Tariffs on China Hit 245%

Enhancing Wireless Communication with AI-Optimized RF Systems

Breaking Boundaries with Ultrawide Band (UWB) Technology: A Deep Dive into High-Precision Wireless Communication

Infineon receives approval for funding under the EU Chips Act – IPCEI funding drives innovation projects in Europe forward

SPEAG combines CMX500 OBT with the DASY83D system for automated SAR testing of 5G NR devices

u-blox expands its NORA-B2 Bluetooth LE modules series using the nRF54L chipsets to address all mass market segments

Latest Posts

Top 10 Deep Learning Applications and Use Cases

Infineon AIROC CYW20829 to support Engineered for Intel Evo Laptop Accessories Program

The Best Substation Training Programs

Deep Learning Architecture Definition, Types and Diagram

How JSD Electronics Uses AI and Machine Vision to Deliver Zero-Defect Electronics

India’s Electronics Production Climbs to $133 Billion, Export Growth Accelerates

Editor Picks

Top 10 Deep Learning Applications and Use Cases

Infineon AIROC CYW20829 to support Engineered for Intel Evo Laptop Accessories Program

The Best Substation Training Programs

Deep Learning Architecture Definition, Types and Diagram

Popular Posts

How JSD Electronics Uses AI and Machine Vision to Deliver Zero-Defect Electronics

India’s Electronics Production Climbs to $133 Billion, Export Growth Accelerates

Top 10 Deep Learning Algorithms

India’s Electronics Exports Strengthen with 47% Jump, Says Piyush Goyal

Must Read

Toradex and QNX Address Industrial Robot Safety Amidst ISO 10218 Standard Updates

Rohde & Schwarz collaborates successfully with the Taiwan Space Agency to develop a dual-function satellite test solution

Top 10 Deep Learning Frameworks

Govt Confirms Tariff Stability for Indian Pharma, Electronics

ABOUT US

FOLLOW US