Abstract
The exponential rise of social media websites
like Twitter, Facebook and Reddit in
linguistically diverse geographical regions
has led to hybridization of popular native
languages with English in an effort to ease
communication. The paper focuses on the
classification of offensive tweets written in
Hinglish language, which is a portmanteau
of the Indic language Hindi with the Roman
script. The paper introduces a novel
tweet dataset, titled Hindi-English Offensive
Tweet (HEOT) dataset, consisting
of tweets in Hindi-English code switched
language split into three classes: nonoffensive,
abusive and hate-speech. Further,
we approach the problem of classification
of the tweets in HEOT dataset using
transfer learning wherein the proposed
model employing Convolutional Neural
Networks is pre-trained on tweets in English
followed by retraining on Hinglish
tweets.