Corpus Linguistics: The Basics

Google+ Pinterest LinkedIn Tumblr

Welcome, avid learner of linguistics in
this video I want to introduce you to the wonderful world of corpora which
can serve as a really helpful tool when it comes to research in linguistics. I
recall reading a text around grammatical change by Laurie Bauer in
which he describes how he tried to figure out how the rule of comparing
adjectives changed over time and what he did is he read through lots and lots of
newspapers to scan them for adjectives in comparative to see whether
two-syllable words had an "-er" added or if they were preceded by a work "more"
But, poor guy, his job would have been so much easier had there been computers and digital
text around in the time those newspapers printed so what's the point of this
though well what Bauer did was he basically
treated those newspapers like a corpus Alright, what is the corpus? If you know
me you know that I like to explain words by saying where they come from so here
you go I'm so predictable.

The word corpus comes
from Latin and means body and incidentally it's the same route for the word corpse
for obvious reasons I guess So a corpus as a body – we've established that
bad what kind of body? Coming to think of it using the word "body" is really just a
metaphor I think I'll just post a link to my metaphor video somewhere around here definition a corpus is a collection of
authentic texts commonly used for the purpose of research yes it's that simple By "authentic" we mean that text file written by
native speakers.

And in a way really everything can be a corpus: newspapers,
novels, recipes, facebook posts, tweets you name it. how does it all work well let's
assume you're a massive Justin Bieber fan dear viewer which could be about
accurate ever look at the demographics of my videos. So you're a Justin Bieber fan
and you want to dedicate your time to analyzing his linguistic prowess what you could do is you could gather
every single tweet he has ever written and save them in one file. Now this is
your corpus: A very simple one but it's fully functional ready and with the
right software you can now look at things in your Belieber corpus such as
work frequency, collocation or even concordance but what are those three
things I'm so glad you asked let me show you that with an example I
have not compiled a Justin Bieber corpus though because f*ck it time's too precious
I just take a simple text file one of my literary studies assignments and run it
through a free software called AntConc right word frequency helps you check how
often a word appears in a text.

This can be really useful because that will allow
you to make statements about the register and audience of the text my
most frequently used words here: the , off, to and a. and then the words "Rivers" and "shell shock" which isn't much of a surprise because the essay actually was about
shell shock and shell-shocked therapist called Rivers in collocation
you can look at kinds of words that seem to appear in close proximity they sort of form pairs and inauthentic
text this will tell you a lot about which words are more likely to combine
in a language in my example here I screened my corpus for collocates of the word
"shell shock" here it tells us how often these collocations appear and if the
collocate is to the left or to the right of my search itenm finally concordance in which is probably
the most pleasing to OCD people because not only does it make pretty pretty
columns it allows you to see what bigger chunks of language surround your search item in the corpus really cool so as you see corpus analysis has lots of
benefits first of all it deals with relatively objective data and can be
easily carried out with huge quantities of texts also it's a descriptive method
and that's always a plus isn't it? also working with corpora can be really beneficial
to high proficiency language learners Not sure about the preposition you have to use
after "abide"? Just look it up in a corpus and here's a pro tip for simple things
like this even Google's frequency count can be really helpful as well so moving
on corpus linguistics grants insight into really interesting fields of
linguistic such morphosyntactic patterns processes of language change or
even lexical traces of discourses and that's it now you've got general idea of
corpora did you like the video please give it a
thumbs up and subscribe to my channel wow so when analyzing a text with
concordance I could actually find traces of discourses that's really really neat!
But what's a discourse, though? Don't panic, avid viewer, I've got you covered with the video just click

As found on YouTube