Gender bias in Hindi word embedding

Bharti, Barkha

dc.contributor.author	Bharti, Barkha
dc.date.accessioned	2022-03-22T09:51:50Z
dc.date.available	2022-03-22T09:51:50Z
dc.date.issued	2021-07
dc.identifier.citation	23p.	en_US
dc.identifier.uri	http://hdl.handle.net/10263/7289
dc.description	Dissertation under the supervision of Debapriyo Majumdar	en_US
dc.description.abstract	The purpose of this paper is to present a study on gender bias in word embeddings in the context of the Hindi Language. It has been shown that word embeddings capture human biases (such as gender bias) present in the corpus and how they relate words to each other. The Hindi-language word embeddings were chosen with the intent of giving insight into gender bias across a variety of domains, with the expectation that some would show significantly greater bias than others. We use WEAT’s hypothesis testing technique to confirm the presence of gender bias, and we find it useful for expanding the very narrow range of well-known gender bias word categories often used in the literature. We’ll test the presence of gender bias in four sets of word embeddings trained on corpora from different domains: Hindi CoNLL17, Hindi Wikipedia 2016 database dumps, and Bollywood lyrics dataset. We also mitigate the bias from the embedding by identifying the gender direction and quantifying the bias independent of its alignment with the crowd bias. Then, we’ll explore the efficacy of debiased embedding using Sentiment Analysis of Hindi Movie reviews and compare the results of sentiment analysis using original embedding and debiased embedding.	en_US
dc.language.iso	en	en_US
dc.publisher	Indian Statistical Institute, Kolkata	en_US
dc.relation.ispartofseries	Dissertation;;CS1911
dc.subject	Gender bias	en_US
dc.subject	Hindi word embeddding	en_US
dc.subject	WEAT hypothesis	en_US
dc.subject	Debiasing algorithm	en_US
dc.title	Gender bias in Hindi word embedding	en_US
dc.type	Other	en_US