Online Public Access Catalogue (OPAC)
Library,Documentation and Information Science Division

“A research journal serves that narrow

borderland which separates the known from the unknown”

-P.C.Mahalanobis


Image from Google Jackets

Dealing with classication irregularities in real-world scenarios/ Payel Sadhukhan

By: Material type: TextTextPublication details: Kolkata: Indian Statistical Institute, 2020Subject(s): DDC classification:
  • 23 006.312 Sa124
Online resources:
Contents:
Introduction -- Class imbalance handling through estimation of minority class -- Handling multi-label datasets – from a perspective of feature extraction -- Handling multi-label dataset -- Open Set Classification -- Conclusion and Scope of Further Research
Production credits:
  • Guided by Prof. Sarbani Palit
Dissertation note: Thesis (Ph.D.) - Indian Statistical Institute, 2020 Summary: Classification of objects is a basic chore of machine intelligence. Over the years, a number of classifiers from different genres have been developed by the machine learning community. To increase the pertinence of machine learning algorithms in human lives, we have to work on the interface of algorithm design and its utility. Traditional classifiers are designed on the basis of a number of assumptions like i] well-balanced class cardinalities, ii] membership of an instance to more than one overlapping classes, iii] an equal number of classes in the training and test phase and more. A classifier fails to perform optimally or meaningfully or both, whenever there is a breach of one or more of these assumptions. Interestingly, datasets from a number of real-world domains have shown to possess many of these. This dissertation is motivated to address the three above-mentioned assumptions and accomplish purposeful learning of the data. Class imbalance is the quantitative disproportion between the cardinalities of some or all classes of a dataset. For a two-class scenario, the class with a significantly higher number of instances is termed as the majority class whereas the other is the minority class. While training a traditional classifier with class-imbalanced data, usually the classifier is found to get biased towards the quantitatively abundant class. In one of our work, we handle the class imbalance problem by estimating the minority set and consequently adding synthetic minority points from the estimated set to decrease the difference in cardinality. Multi-label nature, the membership of a feature vector to two or more labels is another addition in recent years. Though the instance set (feature values) is the same across all labels, the positive and negative class partition varies from label to label. Extraction of a label-specific feature set is an efficacious solution to this problem. This is the motivation of the second work of this thesis. Furthermore, the multi-label datasets suffer from the problem of class-imbalance. The degree of class-imbalance varies from label to label which further aggravates the problem. In the third work of this thesis, we have addressed the class imbalance aspect of multi-label datasets. Lastly, we handle open set classification. In open set classification, we have to correctly classify the instances belonging to the known class (seen during) besides detecting the instances belonging to the unknown class (class unseen during training). On encountering such a problem, extant classifiers classify the unknown class instances into one of the training classes, which it should not. We propose a a scheme where our classifier rejects a test instance as unknown besides the usual known class classifications (and the two happen simultaneously, as a consequence of the scheme itself).
Tags from this library: No tags from this library for this title. Log in to add tags.

Thesis (Ph.D.) - Indian Statistical Institute, 2020

Introduction -- Class imbalance handling through estimation of minority class -- Handling multi-label datasets – from a perspective of feature extraction -- Handling multi-label dataset -- Open Set Classification -- Conclusion and Scope of Further Research

Guided by Prof. Sarbani Palit

Classification of objects is a basic chore of machine intelligence. Over the years, a number of classifiers from different genres have been developed by the machine learning community. To increase the pertinence of machine learning algorithms in human lives, we have to work on the interface of algorithm design and its utility. Traditional classifiers are designed on the basis of a number of assumptions like i] well-balanced class cardinalities, ii] membership of an instance to more than one overlapping classes, iii] an equal number of classes in the training and test phase and more. A classifier fails to perform optimally or meaningfully or both, whenever there is a breach of one or more of these assumptions. Interestingly, datasets from a number of real-world domains have shown to possess many of these. This dissertation is motivated to address the three above-mentioned assumptions and accomplish purposeful learning of the data. Class imbalance is the quantitative disproportion between the cardinalities of some or all classes of a dataset. For a two-class scenario, the class with a significantly higher number of instances is termed as the majority class whereas the other is the minority class. While training a traditional classifier with class-imbalanced data, usually the classifier is found to get biased towards the quantitatively abundant class. In one of our work, we handle the class imbalance problem by estimating the minority set and consequently adding synthetic minority points from the estimated set to decrease the difference in cardinality. Multi-label nature, the membership of a feature vector to two or more labels is another addition in recent years. Though the instance set (feature values) is the same across all labels, the positive and negative class partition varies from label to label. Extraction of a label-specific feature set is an efficacious solution to this problem. This is the motivation of the second work of this thesis. Furthermore, the multi-label datasets suffer from the problem of class-imbalance. The degree of class-imbalance varies from label to label which further aggravates the problem. In the third work of this thesis, we have addressed the class imbalance aspect of multi-label datasets. Lastly, we handle open set classification. In open set classification, we have to correctly classify the instances belonging to the known class (seen during) besides detecting the instances belonging to the unknown class (class unseen during training). On encountering such a problem, extant classifiers classify the unknown class instances into one of the training classes, which it should not. We propose a a scheme where our classifier rejects a test instance as unknown besides the usual known class classifications (and the two happen simultaneously, as a consequence of the scheme itself).

There are no comments on this title.

to post a comment.
Library, Documentation and Information Science Division, Indian Statistical Institute, 203 B T Road, Kolkata 700108, INDIA
Phone no. 91-33-2575 2100, Fax no. 91-33-2578 1412, ksatpathy@isical.ac.in