03 July 2018

The exponential rise of social media websites like Twitter, Facebook and Reddit in linguistically diverse geographical regions has led to hybridization of popular native languages with English in an effort to ease communication. For instance, Hinglish is formed of the words spoken in Hindi language but written in Roman script instead of the Devanagari script. It is a pronunciation based bi-lingual language that has no fixed grammar rules. Therefore, it is difficult to derive any useful information from such code-switched languages. Therefore, it necessitates social media companies to build models that can extract useful information from such languages. This will be useful in a number of applications such as detecting offensive languages, understanding feedback, opinions, and sentiments of users towards some product, news, events, policies, etc. At MIDAS@IIITD, we focus on building deep learning models which can extract useful information and automatically perform efficient classifications from code-switched languages such as Hinglish. For instance, our recent paper on detecting offensive language in Hinglish tweets is published in ACL, a premier NLP conference.