NLP or Natural Language Processing is an active area of research that deals with automatic manipulation of natural language for example, speech or text. The field of NLP is at the intersection of Linguistic and Computer science.
Computer science techniques allow us to represent text as numbers in the form of a matrix or tensor and process and identify patterns in the text. Machine learning and Artificial Intelligence, which is responsible for much of the recent advancements in NLP, are also sub-domains of computer science.
Linguistics on the other hand deals with the way natural language is structured, the way words sound and meaning of the sentences and paragraphs in a text.
Various stages of linguistic analysis are now fused with computer science techniques to power the NLP techniques. For example:
Morphology deals with the way words in a sentences, formation of these words how they change the meaning of the sentences. For example, Mr. or Ms. as prefix provides information on gender, singular vs plural provides information on the number of entities addressed in a sentence etc. These in turn become morphological features that are used to train Machine Learning and AI algorithms.
Syntax or Parsing deals with relationship between different words within a sentence or how a sentence is constructed e.g. part of speech for a word, if the word is a modifier or not etc. Various techniques such as building a tree of relationship shared by different words is used to process the syntax or parse the sentences.
Semantics deals with the meaning of a sentence. Some more well known NLP tasks such as recognising named entities or extracting the relationship between different entities in a sentence. Though “meaning” can actually mean in the Natural Language Processing is an active area of discussion.
Pragmatics deals with understanding the text as a whole. It is a field of study that deals with how context contributes to the meaning of the sentence. Some of the popular problems that pragmatics attempts to solve are identifying the abstract topics present in a set of document, summarising a text and question answering.