Words, Words, WordsCorpus Linguistics is a relatively new method of studying languages. Corpus refers to a body. That body is one of words. The plural of corpus is corpora. Linguistics refers to the study of languages. Together Corpus Linguistics is the study of languages by way of analyzing enormous bodies of words. Only recently have computers been available that could examine bodies that held hundreds of millions of words. Today these huge collections of words can be analyzed quickly, accurately and impartially. This is a powerful tool in the study of how languages are actually used.

The software that are used to search, categorize and display the results of this analysis are called concordances and they are becoming more available everyday. Concordances can be found on the Internet that are free or very reasonably priced. Some are easy to use and others require detailed instructions to perform the intricate searches and analyses that they are asked to do.

The value of Corpus Linguistics is that we can now say with certainty what words we use when we communicate. We are finding that when we write for academic purposes, the words and groups of words we actually use are different than when we write for other purposes. Those other purposes may include business, fiction writing or advertising. Even within the genre of journalism, there are different registers, such as front pages, editorials and sports reporting. Even the language of headlines within those specific registers can be unique. Language used in speech is significantly different from that used in any of the written disciplines. Speech itself can be further categorized as face-to-face conversation, telephone conversation, lectures, asking for and giving directions and even consumer purchasing. The list is as detailed as we choose to make it.

Prior to the appearance of this incredible tool we were forced to rely on our perceptions of what words we thought we used when we communicated and we are now discovering that those perceptions were inaccurate.

This ability to examine large numbers of words empirically is giving birth to another linguistic study. Through focused searches, we are discovering that there exist within all the different registers what are called lexical bundles, or groups of words that frequently appear in conjunction with each other. Prior to the advent of corpus linguistics these bundles of words were unknown and if they were conceived of at all, they were merely imaginary. Now their existence has been empirically verified and they are becoming powerful tools in second language acquisition. Please click here for more information on lexical bundles.

Different corpora now exist and more are being developed. Today we can access British English, American English, Australian English and New Zealand English. Corpora are being collected from different registers and different time periods. There are corpora being compiled in languages other than English. The challenge is to collect as authentic and relevant corpora as possible. In general, the 'best' corpora are the most complete and in order to be complete, a corpus should be large. When it comes to corpora, size does matter.

A good place to explore the wonderful world of Corpus Linguistics is Martin Weisser's 'Bookmarks for Corpus-based Linguists'.




Last Updated: November 10, 2016