Success

Mod-01 Lec-01 Introduction

Google+ Pinterest LinkedIn Tumblr

this is Pushpak Bhattacharya professor of computer science and engineering at IIT Bombay delivering a course on natural language processing natural language processing is a very important topic in today's world of Internet in the in this age there is lot of information on the web in the form of text it is a very important concern in today's world to obtain information from this text and use it for various purposes this is the motivation for understanding natural language processing its tools techniques philosophy and principle let's go ahead the course instructor is myself dr.

Pushpa Bhattacharya my homepage is www.jfn.co.jp/toho language processing and machine learning course homepage which will be created is likely to be under WWC d dot iit B dot AC dot in slash tilde NLP 2 0 1 0 moving forward we give a perspective on natural language processing different areas of artificial intelligence and their interdependencies if we look at this diagram there are these three layers search logic and knowledge representation forms the first layer the next layer is machine learning and planning the final layer is natural language processing vision robotics at expert systems I would like to spend some time understanding these areas and their interrelationships before that let me talk about the importance of artificial intelligence in computer science and engineering artificial intelligence is called the forcing function for computer science computer science has grown by leaps and bounds in recent years and one of the reasons for that has been artificial intelligence artificial intelligence has always pushed the boundary of computer science and engineering by demanding more and more from the machine it is understood that the machine should come closer and closer to human beings in terms of its usage and its application if we look at the lower most layer in the transparency if you see this here natural language processing forms the leftmost corner block followed by vision computer vision then robotics then expert systems I would like to make some remarks on this natural language processing is concerned with the computer being able to process human-like languages like English French Marathi Hindi and so on in computer vision the machine processes seen and understands how to operate in the scene in robotics there is an embedded software inside the robot asking it to perform various actions like navigating on a terrain expert system is concerned with the expert level performance of a software on a specific task for example the task could be diagnosis of disease and curing this a doctor is known to operate with a number of rules a very large number rules obtained by years of education and practice on patients so the expert system is concerned with emulating this behaved behavior of the expert let's move on to the feeding disciplines which are at the second layer we find that machine learning and planning feed into a number of layers in the outermost category for example natural language processing is fed by machine learning and natural language processing is also fed by knowledge representation the reason for this is that in current world natural language processing is using lots of statistical techniques statistical techniques are rich in learning techniques they make use of the knowledge contained in the data in today's Internet world we have a large amount of text in the form of a number of documents HTML pages PDF pages word pages PowerPoint presentations and so on and so forth the internet is full of textual documents so these textual documents have to be made use of a program has to make sense of this textual document that requires natural language processing technique now the point here is that suppose human beings are asked to make sense of all this data then how many pages can a human being really sift through in let's say 24 hours of time okay or even if you take you know working R of eight hours per day how much of data can a human being possibly see that is the reason why it is important to develop machine learning techniques statistical techniques which look at the data and obtain the knowledge content of the data this is the importance of machine learning let us go to the transparency again and see that machine learning is feeding into natural language processing here we find that the first layer is search logic and knowledge representation in search the machine is faced with a number of choices a number of choices as it computes search algorithms try to find out the best possible strategy the optimal strategy for the computer now one might ask in natural language processing is it necessary to conduct search we will see many examples where we are faced with a number of choices when you are processing the textual data and search is very important for this I will give you a very small example suppose I utter the word I went to the bank to withdraw some money now it is known that bank is a very ambiguous word van contains two meanings one meaning of bank is the financial bank where one deposits money and withdraws money from the other meaning of band is the bank of a river the side of the river okay the landmass of the river the landmass which is on the side of the river so when I say I went to the bank to withdraw some money which which meaning of bank did I have in mind this requires such okay a program will have to read the sentence left to right I went to the bank to withdraw some money until it comes to money it is difficult for the machine or even a human being to understand that we are talking about the financial sense of back so this is a very simple example to show that we conduct search and we solve problems of search when we understand natural language we'll have many many examples coming up when we discuss ambiguity of natural language processing so let's look at the transparency once again and understand the importance of logic what is logic – logic is a vehicle for reasoning and inferencing logic is a vehicle for in inferencing in the following sense a number of rules and pieces of knowledge are given in logical formalism so in logic we are concerned with a number of constructs like if X is true then Y is true okay so if X is true then Y is true that says that whenever we can satisfy the values of X the value of Y also is satisfied in natural language processing logic if forms a very crucial component because the textual knowledge has to be converted into logical forms which a machine can process so this is the importance of logic and mainly propositional calculus predicate calculus and some forms of non monotonic logic are used for natural language processing finally we see this box knowledge representation knowledge representation is again critical for natural language processing because sentences contain knowledge and these knowledge has to be extracted and embedded in the machine so this is a very important problem again so let me summarize these transparency in this transparency we see the importance of natural language processing and its place in the whole business of artificial intelligence okay so natural language processing draws from search logic knowledge representation it also draws from machine learning and different areas of artificial intelligence like vision robotics expert systems also draw from many different areas of AI as shown in the diagram if you look at the last sentence given the transparency AI is the forcing function for computer science that tells a story that artificial intelligence pushes the frontier of computer science we can take a step forward and say that natural language processing is the forcing function for artificial intelligence itself in natural language processing we are concerned with day-to-day communication now day to day communication requires understanding the words the phrases the structures the syntactic processing the meaning of the sentences all these are extremely important even for artificial intelligence and therefore when we make an advancement in natural language processing it impacts artificial intelligence as a field and artificial intelligence in its own turn advances the frontiers of natural language processing so I think I am clear about the whole chain of influencing that different fields do NLP influences a high a AI influences artificial intelligence NLP advances AI a advances computer science and engineering let us move ahead I would like to spend some time on the course content the top level topics which are mentioned here sound words and word forms in the next slide structures meaning and web 2.0 applications if we move to the first topic sound sound is concerned with the biology of speech processing natural language comes in two different forms one is the spoken utterances the other is written communication if we look at the biological processing of speech how do human beings and understand speech human beings hear words the sound patterns which come to the year and those speech patterns are understood by the brain there are specific areas of brain designated for auditory signal processing okay and the speech sound is converted into patterns which are stored in the brain and they in turn activate ourselves to perform some action we utter some sentences in response to speech or we take some action we perform in the real life world moving our hands legs and so on so speech processing refers to concept formation in the brain it also refers to taking action in the real life world so the topics that are mentioned there pertain to different areas of speech processing the biology of speech phonetics phonology place of articulation and many different statistical techniques which are needed for processing of speech now why speech important for natural expressing speech is important because speech gives many different statistical techniques consumed by natural language processing in today's world earlier when natural expressing used to be completely reliant on a linguists expertise a language experts proficiency in lexicographers proficiency and so on it was completely human driven now it is seen that there is lot of data in textual form on the web and we can use machine learning techniques to make sense of this data okay so where do this techniques come from this techniques come from two fields namely speech and computer vision they have been for a long time processing signals and patterns using machine learning techniques statistical techniques and today's natural language processing cannot ignore those techniques in fact they benefit a lot from the application of those techniques so the main point I'm making here is that speech actually provides natural language processing with its own statistical approach statistical techniques moving forward word and word forms look at the second point here words and word forms words and word forms I mentioned here morphological fundamentals morphological diversity of Indian languages morphology paradigms finite state machine based morphology automatic morphology learning shallow parsing named entities maximum entropy models and random fills I have listed out a number of topics let me explain to you in brief what I mean here words come in many different forms and our concern is to be able to process words very skillfully words form the first step when you process language it is the words which we have to deal with in in written communication for example I took this example very famous example in natural language processing where the sentence was I went to the bank to withdraw some money it is also possible to say I will go to the bank to withdraw some money I will go to banks to withdraw some money okay I will go to banks to withdraw all my money so look at this look at what is happening what is happening is that the same word is coming in many different forms for example banks is coming from the word Bank will go when they come from the root word go now English is a very simple language in terms of morphology English produces word forms many of which are quite simple for example to form a future tense from go you just have to plug will before go will is called an auxillary Bob in Hindi you will have to say man Bank jeonga the word junga comes from two morphemes jaw to go at Unga will go ok we'll we'll so jobless una produces Yaga therefore it makes sense to take the word junga and break it into two pieces jaw Anagha this is known as morphology processing morphological analysis the opposite process is called morphological or morphology synthesis we have the root word and from the root word we should be able to produce the word form again to take an example in English suppose the root word is transport okay we transport some material now if I say that this word transport is for singular number and present tense okay he transports some material so given the root word transport and the fact that the tense is present ends and the person is third-person singular number transport becomes transports okay and s is added to transport so this is known as morphologic generation or synthesis imagine a machine which is required to do natural language processing and natural language generation so what is given to the machine is let us say C the machine is supposed to describe a sale and it sees a human being transporting some material from point A to point B so the machine will have to produce the sentence he is transporting material from point A to point B the word transport now has become transporting this is the morphological process and a number of computer algorithms have been devised to deal with morphological processing how can we efficiently process words and obtain their route forms so in this course we would like to see finite state machine based morphology how morphology is processed by means of finite state machines ok we will cover this topic in some detail just to show how language and computer science come together in the form of a very simple machine namely the finite state machines we go to the next transparency we come to now more advanced topics structure meaning and web 2.0 applications in structures I mentioned the topics as theories of parsing parsing algorithms robust and scalable parsing or noisy text as in web documents hybrid of rule-based and probabilistic parsing scope ambiguity attachment ambiguity resolution all these are technical terms which will be explained soon but let me make the main point here parsing or syntactic analysis happens to be in extremely well-researched topic in natural language processing all other areas of natural language processing have been investigated but not so much as parsing or syntactic analysis analysis syntactic analysis or parsing is also seen in other branches of computer science like compilers and programming languages but there we are dealing with a much much simpler problem the problem of parsing a programming language a piece of program is parsed this has hardly any complexity compared to natural language AC program or a Fortran program or a Pascal Java program they do not have the complexity of natural language sentences paragraphs and chapters when we have a running piece of text large amount of text to be processed by a machine and we have to isolate the words the morphemes within the words the morphological processing the phrases which are present in the sentence ok noun phrase verb phrase which will deal with after some time all these structures which are detected from the sentences correspond to syntactic analysis or structure processing this is an extremely well understood area and a number of algorithms exist in this I will cover those algorithms in detail parsing will form an important component of our discussions from structures so we come to the next topic if you look at the transparency again it is meaning meaning meaning is the ultimate aim of natural language processing how do we extract the meaning of sentences this is our main concern I mentioned the topics here as lexical knowledge networks ordinate Theory Indian language word nets and multilingual dictionary semantic roles words in these ambiguous wst and multilingual 'ti metaphor score references a very large number of topics all of them are complex difficult topics I would like to again spend some time on this topic meaning I said is the main concern of natural language processing how do you understand me now in this business something that comes to big help is called word net and dictionaries and ontologies okay therefore very important components of meaning detection okay a meaning representation now ordinates and lexical knowledge networks are nothing but meaning representations and their interconnections to take an example dog is an animal we look at dogs and we also have this class of animals what do dogs have dogs have tails eyes legs here and so on animals also many different properties most of the animals are moving most of the animals procreate produced children and most of the animals drink water eat food and so on and so forth now dog being a member of the animal family inherits all these properties so there is an intimate meaning linkage between dog and animal so what are we talking about here we are talking about not the word not the word dog and animal we are talking about meanings of the words dog and animal how they relate to each other okay so this is what is represented in lexical knowledge networks and it took a long time for natural language processing to understand that meaning networks are crucial for natural language processing we will spend quite an amount of time on word nets at IIT Bombay we have done Ross a lot of research and development in word net building and I would like to describe our work on Hindi word net marathi word net our effort at create creating Indian language coordinates all over the country which is advancing the state of art in natural language processing in this country so meaning representation through word net ontology dictionary will form an important component of our discussions coming to the next topic which is web 2.0 applications I mentioned the items as sentiment analysis text entailment robust and scalable machine translation question and selling and multilingual setting cross lingual information retrieval what do I mean by all this those of you who are keeping track of what is going on in the Internet internet is again going through a revolution Internet itself has caused a revolution in human life civilization has been profoundly influenced by web by internet things which were not imaginable before the advent of Internet is happening regularly today absolutely regularly so web is now coming up with its next version which is web 2.0 I mentioned some topics here like sentiment analysis text entailment machine translation in a large scale and so on and so forth let us just take the first topic sentiment analysis if you think carefully we have a completely new world order now never before was so much our public opinion was available in electronic form okay common man can access information about any organization any person just through the click of a mouse so much of information about persons and organizations organizations are available on the web in a completely electronic form there is a completely new scenario it did not exist before so sentiment analysis is concerned with how to look at a document how to process the content of the document and then find out what is the document writer or the speaker saying about a particular entity an organization or a person is the person positively oriented towards the person or the organization or is the person's opinion negative for example take a bank and you look at the blogs that the users of the bank express their opinion see blog you have lot of textual data in electronic form and you would like to understand is the blog praising the bank or is it expressing opinion against the bank this is the concern of the field called sentiment analysis in sentiment analysis we would like to develop programs which automatically understand the opinion of the users from the electronic text this is known as sentiment analysis or polarity detection so similarly text entertainment is concerned with inferencing of text given two pieces of text are they consistent with each other does one text follow from the other in large-scale machine translation which is a very old field an extremely relevant for a country like India where multiple languages have spoke and written the concern is to be able to translate from one language to another like English to Hindi Hindi – Marathi Marathi – Bengali and so on at iit bombay we again have large-scale activity on machine translation the final topic I mentioned there is cross lingual information retrieval if you think about it in a country like India cross lingual information retrieval forms a very important problem users have information in it where do they go for their information need earlier when the web did not exist users is to go to libraries obtain that information from the library nowadays people click click a mouse ok or they use a keyboard they go to the web and obtain the information from the web now imagine what kind of problem a user will face if the language of the user is not English the web the large a large part of web is in English the user has to pose the query in English obtain information in English and then understand the document if the user is not comfortable with English then the user is handicapped this is known as the problem of language barrier in India the comfort level of English is not very high it is known that about five to six percent of Indian population is comfortable in usage of English so for such people when their information need has to be met they should be able to post query in their own language obtain information in their own language but at the background what is happening is the web is processing the query looking at large amount of English documents and then producing an answer in the language of the user this is known as cross lingual information retrieval and Ross lingual information retrieval is a very very relevant and current problem for many countries and India is no exception so this in a nutshell is at overview of the topics I would like to cover in summary we look at speech processing techniques we will look at how words are processed and stored morphological processing dictionaries we will understand the techniques of syntactic analysis parsing meaning representation is is done in dictionaries word nets and autologous we would like to discuss those topics and finally some of the web 2.0 applications like sentiment analysis text entailment machine translation and cross lingual information retrieval will be covered this gives an overview and I suppose this is a an exciting set of topics and once these topics are covered one gets a good overview of what is going on in natural language processing we will move forward now with some of the central topics we have said many times that web brings new perspective if we look at the diagram here we have what is called the Q s a triangle Q is the query S is the search and analytics is the intelligent processing of information on the web this is known as the Q s a triangle a very famous concept in the scenario of web Q query S search and a analytics okay and this actually brings in many fine points of discussion we do not have enough time to go into those details let me just make one or two remarks on this when the user is posing a query this query is processed by the search engine and documents are searched so this is the search component the first component was the query component query has to be processed after and when the document is come up okay large number of documents come up from the search engine we have to understand what these documents mean and do they satisfy our information need the topic of analytics is concerned with that it is concerned with how to process these documents intelligently so we can imagine a future scenario where the user gives in the query the documents are searched and the essential information with respect to that query is presented to the user all by the Machine all by the computer okay so after the query is presented search and analytics they work together they work in synchrony they work in tandem to produce information for the user to consume for the user to use that information okay that is the qsa triangle and you can see that since the documents are mainly in textual form there are of course lots of documents in image form but documents which are in textual form they require natural language processing techniques to be processed okay moving forward we now proceed to define natural language processing as the slide shows it is the natural log expressing is a branch of artificial intelligence we have already dwelt on this particular issue in one of the transferences before where we showed many different dependencies of AI now natural language processing has two goals the science goal and the engineering goal the science goal and engineering goal they go hand-in-hand they work in tandem in science goal the aim is to understand how language is produced and how language is understood by an intelligent entity for example how do I process the sentence I went to the bank to withdraw some money if somebody asked me why did you go to the bank I'll answer to withdraw some money okay how do I answer this question you I must have understood the question I must have understood that the meaning of bank here is the financial bank not River backed how did I do that okay so this is a cognitive process which is happening in the brain and human beings are extremely good at it the science goal of natural language processing is to understand this phenomenon this tremendous phenomenon of how we process language how we generate language how we interact with our fellow human beings through language in the science goal the engineering goal on the other hand is concerned with the use of the techniques of natural language processing when we create a natural language processing program it has a particular use for example I could use the rational language tracing program to read a text okay and produce sounds corresponding to the text this for example could be very useful to a blind person a blind person is not able to read a piece of text himself so we give the text to a natural language processing program it automatically reads the document produces the speech sounds and the blind person understands what the meaning of this Matisse so this is an extremely important utility of language processing consider another scenario where the person is not blind but he is speaking and as he speaks the speech sounds are interpreted by a speech processing language understanding system and those speech sounds are converted into a textual file so imagine a imagine how the life of a teacher will become simpler if such a system exists the teacher simply comes to the class delivers the lecture a software program captures those sounds and produces a textual file which can eventually be uploaded on the teachers home page and human and this and the students can make use of the lecture notes the lecture note was not written by the user by the teacher nor was anything written by the student a program captured all the sound and produces the lecture note so this is another very important use namely the speech understanding and speech encoding into text there are many such applications of language processing I mentioned sentiment analysis some time back and this this problem the problem sentiment analysis requires language processing it's a very important problem organizations nowadays are very very concerned about what the public are saying about them on the internet okay and it is impossible for a single human being or even a team of human being to look at the whole web and tell the organization you know people are saying very nice things about you or saying bad things about you so can they employ a software agents can they write software programs which will navigate the web go over the whole web and give to the organization people's feedback about this okay so these are many many different utilities of natural language processing and these pertains to the engineering goal so we mentioned two goals science goal and engineering goal both have to be kept in mind for anybody working on natural language pressing the excitement of the field comes from this this mysterious pheromone and this very very you know deep phenomena of language being processed in the brain and language being produced okay the excitement comes from there this is a science goal and the other kind of excitement that comes is that language processing produces useful tools and resources which makes human beings life easier so science goal and engineering goal proceeding further there was this very very famous test called Turing test which was proposed by one of the great I would say one of the great men of computer science and mathematics Alan Turing who proposed that a particular test can be conducted to find out if a machine is truly intelligent if you look at this diagram there is this test conductor who is interacting with a machine and a human the person the test conductor does not know where is the machine and where is the human being imagine these two circles to be two different rooms and all the communication takes place through through the keyboard now the task is can the test conductor find out which room contains the machine and which room contains the human being can the test conductor find out which is the machine and which is the human so if the test conductor cannot distinguish between the Machine and the human then then the machine is supposed to be intelligent it should it is supposed to be passing the test for intelligence so let's spend some time on this test this is an extremely famous test called Turing tests in Turing test our goal is to see if a machine is it is truly intelligent if a program is truly intelligent and this is proposed by Alan Turing to to test a program for its intelligence there are many many criticisms many philosophical criticisms against Turing test however the test has state over the years the test has been studied by AI students researchers so much so nowadays you have a competition okay in this competition you essentially conduct a Turing test to find out if a program is truly intelligent or not the participants are subjected to something like a Turing test and they are given enormous amount of money as prize if the test conductor who is a human being cannot detect if the software program was a machine indeed or was it actually human being which was interacting with the test conductor now the the Turing test actually inspired something called Eliza which is a very famous program one of the first programs to to demonstrate natural language processing know in a very nice way but it brought out a number of aspects of natural language processing so here is the URL which can be tried which you can try to interact with Eliza I show a snapshot of Eliza look at this screen and look at the first sentence here the whole screen captures interaction with a mechanical psychotherapist it is a software program which behaves like a psychotherapist the first sentence is I am the psychotherapist please describe your problem each time you are finished talking type ret twice that is the return button twice this is the sentence coming from the machine or the program and the and the human being answers saying I have a terrible headache here it is I have a terrible headache the machines response is maybe you should consult a doctor of medicine I am a psychiatrist so this is a very nice answer if you can appreciate this point is a very nice answer very natural answer any psychotherapist or psycho psychiatrist will answer this way if you have a headache where should you go you should go to a doctor not to a psychiatrist then the human being say say is okay I will do that then now the machine comes back and says can you elaborate on that and this is a bit of strange response why would the entity respond this way can you elaborate on that the machine has already advised the human being to go to a doctor and the human being also the patient also has agreed yes I will go to the doctor now why should the Machine come back and say can you elaborate on that then the human being says what are you talking about no I cannot elaborate see you so he is about to leave I cannot elaborate I will go now see you later then the Machine says why do you say that then the human being says now you are just talking nonsense the human being is irritated now the machine says what makes you believe now I am just talking nonsense human being says you are not making any sense then the machine says perhaps your plants have something to do with this you know so after can you elaborate on this and after the human being is somewhat irritated and says that he cannot do much about all these conversations the machine goes into a completely nonsensical mode of communication okay now you are just talking nonsense what makes you believe now I am just talking nonsense and so on so what does this show what does this communication show the communication shows that initially or even up to some level the machine can feign to be intelligent can pretend to be intelligent okay the moment the conversation takes a deeper turn it requires more complex reasoning more complex background knowledge okay a lot of experience the machine shows sign of failure it begins to ask question which a completely nonsensical and shallow okay and why does it where does this happen Eliza was a program which was created by Weizenbaum and the goal was to show that natural language processing actually does not have much substance in this nobody will agree with this point of view anymore in today's world natural language processing is understood to be a very very deep field with tremendous amount of utility but in those days Weizenbaum set out to showing to the world let's see I can create a program a software program which can intelligently converse with a human being without ever informing the human being that it is actually a software agent okay so that is the point the software program is showing that it is powerful enough to deal with language whereas it actually is not the internal algorithm was the following there were a number of templates and keywords which the software program was looking for all the time and matching it okay so it has for example a very stock answer what makes you say dot dot dot ok and what makes you say that you are not well ok so suppose I say I am not well and the machines response is what makes you say that you are not well and there is some cleverness in converting I to you you to I and so on but the whole thing is template based there is a template which says that whenever you see a sentence and you do not know what to do with this sentence how to process the sentence simply output what makes you think or what makes you say something ok so this is this is not intelligence this hardly intelligent there is a completely superficial processing of the sentence similarly there were many other templates which are inbuilt in the program and we shall try to show that just by activating these templates a machine can meaningfully converts but this is not true as the example is showing here the moment the conversation goes into deeper things the machine begins to fail still the program was inspired by Turing test within bound wanted to construct a program which will behave like a human being and converse with a another human being moving forward we come to probably the most important starting discussion of natural language processing namely ambiguity so we write here ambiguity this is what makes natural language processing challenging and this is the crux of the problem ambiguity is there everywhere at all stages of natural language processing and we proceed to elaborate on that it is known that there are different stages of language processing these stages are listed here phonetics and phonology morphology lexical analysis syntactic analysis semantic analysis pragmatics and finally discourse so all these topics are very well known in natural language processing and speech my point here in listing all these topics is that we will see everywhere we have to deal with the problem of ambiguity starting from phonetics and phonology up to discourse everywhere we have the problem of ambiguity coming up and we have to find solutions to those problems let us take the first of this list phonetics phonetics is concerned with the processing of speech the challenges in this are homophones ok namely strings of alphabets or words which sound similar so Bank in the financial sense and Bank in the river sense they sound similar therefore they are called Homo phones near homo phones are those which have very close sound for example the word Matra and Mothra in Hindi these two words also are used in Marathi they are also used in Bengali most Indian languages have these towards if the language is from the indo-european origin Soumitra and Matra sounds similar but they are not identical unlike homophones so near homophones and homophones we come to this very intriguing problem of word boundary detection what what is it I have written here ay ay ay ay ay ay Y e and GE is a Hindi word ok there is a Hindi string this can be broken up into ace are jogging will come or edge I again will come today so if you break it at J before J then it is a giant a will come and if you break it after J then it is odd I again will come today ok so the speech understanding system has to break the word at at the appropriate place ok for example if you break it here Raja and n gay then these two morphemes are not making much sense no listener will break this string into these two parts aha and ingame I take this example in English one of my favorite examples I got a plate so if I say it very quickly you will not know what I said I got a plate ok Mooji a plate milli or I got a plate match there say jaga ok so these are the two meanings and when this whole string is given I got a plate from the context we have to make out is it I got up late my dear say yoga or I got a plate Mooji a plate Millie similarly there are these problems of phrase boundary detection this is an example here I leave it as an exercise to you to see how the the phrase boundary when broken at when broken before as such or after as such can produce two different meanings a very important problem in phonetics is discipline see I show some of the strings here are whom ahem etcetera now these strings have no meaning these only assess the speaker to organize his thought okay the speaker by some time through these regions of dispensing if I say I will go to school but I do not know what to carry there I forgot my umbrella so all these are boom etcetera are regions of disfluency they give they use the speaker some time to organize his or her thought so what they say in this slide what I said in this slide was that when we have phonetics and phonology when we are dealing with speech sounds we have to deal with three problems one Homo phones too near homophones three-word boundary so these are the three different problems we deal with when we are dealing with speech sound we move forward and take the challenges involving morphology morphology as described before deals with word formation rules from the root words for example the nouns boy boys gender markings are czarina they come from the root word boy Orser lurk allure key for example they are also they correspond to gender marking verbs give rise to different forms through tense like stretched stretched aspect for example perfective sit from seat we can obtain headset modality for example request the root word is ha or kana from this we obtain the word kayi so first crucial step in natural language processing is morphology we have to detect all the morphemes contained in a large word string so languages which are each in Worf elegy are Dravidian languages Tamil Telugu Kannada Malayalam Hungarian and Turkish in Europe languages which are poor in morphology are Chinese and English Chinese hardly uses any morphological suffixes English is also not very rich in morphological variations languages with rich morphology have the advantage of easier processing at higher stages of processing since we have dealt with the word in all its suffixes prefixes and so on we have made a lot of progress towards word meaning a task of great interest to computer science is finite state machines for word morphology so computer science comes in handy here there are word formation rules and the suffixes which get added to the world they come in particular word let me take an example from Marathi the word Gerizim or cha marisa Mucha Bangla so garage sahaja has three different components girl sommore and cha you could also say garrison or Chani now girl sommore cha and knee they come in particular word knee cannot come before cha cha cannot come before some more okay so there is a particular word in which the suffixes are produced and insert it into the word so this automatically becomes a small parsing problem and this problem is very effectively dealt with by making use of finite state machines computer science and has invested a lot of its energy in understanding finite state machine the algorithms corresponding to that and the theory of finite state machines they come in handy when we deal with morphological processing here we will finish the first lecture you

As found on YouTube