Does Size Matter? Text and Grammar Revision for Parsing Social Media Data

Mohammad Khan, Markus Dickinson, and Sandra Kübler

Proceedings of the Workshop on Language Analysis in Social Media.

We explore improving parsing social media and other web data by altering the input data, namely by normalizing web text, and by revising output parses. We find that text normalization improves performance, though spell checking has more of a mixed impact. We also find that a very simple tree reviser based on grammar comparisons performs slightly but significantly better than the baseline and well outperforms a machine learning model. The results also demonstrate that, more than the size of the training data, the goodness of fit of the data has a great impact on the parser.


Electronically available file formats:


Bibtex entry:

@InProceedings{khan:ea:13,
  author    = {Mohammad Khan and Markus Dickinson and Sandra K\"ubler},
  title     = {Does Size Matter?  Text and Grammar Revision for 
               Parsing Social Media Data},
  booktitle = {Proceedings of the Workshop on Language Analysis in
               Social Media},
  year      = {2013},
  address   = {Atlanta, GA USA},
  pages     = {},
  url       = {http://cl.indiana.edu/~md7/papers/khan-et-al13.html}
}