Nicholas Groom

Dr Nicholas Groom on his research

Duration:  1.59 minutes

My name's Nicholas Groom and I'm a corpus linguist.

Corpus linguistics, what's that? Corpus linguistics involves using computers to study very, very large collections or databases - we call them corpora - of naturally occuring language data. By very large, I mean very large! I'm talking about millions or billions of words. Obviously you need computers to look at databases of that size, it's not possible to analyse them manually.

By naturally occuring, I mean real language that's been generated by real people in real communicative situations. Real purposes and not language that's been artificially generated by linguists to make some kind of theoretical point.

You may think isn't that what linguists have always done? Don't linguists look at real language spoken by real people? Some do, but in the 20th century a lot of the time mainstream linguistics wasn't interested in authentic language data. Instead there was an emphasis on artificial made-up sentences which were generated to test certain theoretical ideas.

The best example of this would be Chomsky's "Colorless green ideas sleep furiously" sentence, which you may have heard of. The point of this sentence is to show that people can make judgements about whether a sentence is grammatical or not, even though it doesn't have any kind of a meaning. It's gibberish really, isn't it? But somehow you feel that it makes sense grammatically. Leading on, Chomsky would use this to argue that there was a sharp division between syntax and semantics, there's no relationship between the form and the meaning of statements.

Corpus linguists such as myself are beginning to challenge that idea. The more we look at naturally occuring language data we think that there are actually genuine relationships between forms and meanings. That's really what my research is interested in studying.