dc.contributor.author |
Bharti, Barkha |
|
dc.date.accessioned |
2022-03-22T09:51:50Z |
|
dc.date.available |
2022-03-22T09:51:50Z |
|
dc.date.issued |
2021-07 |
|
dc.identifier.citation |
23p. |
en_US |
dc.identifier.uri |
http://hdl.handle.net/10263/7289 |
|
dc.description |
Dissertation under the supervision of Debapriyo Majumdar |
en_US |
dc.description.abstract |
The purpose of this paper is to present a study on gender bias in word embeddings in
the context of the Hindi Language. It has been shown that word embeddings capture
human biases (such as gender bias) present in the corpus and how they relate words to
each other. The Hindi-language word embeddings were chosen with the intent of giving
insight into gender bias across a variety of domains, with the expectation that some
would show significantly greater bias than others. We use WEAT’s hypothesis testing
technique to confirm the presence of gender bias, and we find it useful for expanding the
very narrow range of well-known gender bias word categories often used in the literature.
We’ll test the presence of gender bias in four sets of word embeddings trained on corpora
from different domains: Hindi CoNLL17, Hindi Wikipedia 2016 database dumps, and
Bollywood lyrics dataset. We also mitigate the bias from the embedding by identifying the
gender direction and quantifying the bias independent of its alignment with the crowd bias.
Then, we’ll explore the efficacy of debiased embedding using Sentiment Analysis of Hindi
Movie reviews and compare the results of sentiment analysis using original embedding
and debiased embedding. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Indian Statistical Institute, Kolkata |
en_US |
dc.relation.ispartofseries |
Dissertation;;CS1911 |
|
dc.subject |
Gender bias |
en_US |
dc.subject |
Hindi word embeddding |
en_US |
dc.subject |
WEAT hypothesis |
en_US |
dc.subject |
Debiasing algorithm |
en_US |
dc.title |
Gender bias in Hindi word embedding |
en_US |
dc.type |
Other |
en_US |