Dependency Parsing in Natural Language Processing with Examples
This article was published as a part of the Data Science Blogathon
Pure Language Processing is an interdisciplinary concept that uses the fundamentals of computational linguistics and Synthetic Intelligence to understand how human languages interact with technology.
To apply NLP to real-world scenarios, it is necessary to have a thorough grasp of various terminology and ideas. Among which some of the important concepts are Half-of-Speech (POS) Tagging, Statistical Language Modeling, Syntactic, Semantic, and Sentiment Evaluation, Normalization, Tokenization, Dependency Parsing, and Constituency Parsing.
We will examine the principles of Dependency Parsing in this article in order to have a better understanding of how it is applied in Natural Language Processing.
The term Dependency Parsing (DP) refers to the process of examining the dependencies between the phrases of a sentence in order to determine its grammatical structure. A sentence is divided into many sections based mostly on this. The process is based on the assumption that there is a direct relationship between each linguistic unit in a sentence. These hyperlinks are called dependencies.
Consider the following statement: “I prefer the morning flight through Denver.”
The diagram below explains the sentence’s dependence structure:
In a written dependency structure, the relationships between each linguistic unit, or phrase, in the sentence are expressed by directed arcs. The root of the tree “prefer” varies the pinnacle of the preceding sentence, as labelled within the illustration.
A dependence tag indicates the relationship between two phrases. For example, the word “flight” changes the meaning of the noun “Denver.” As a result, you may identify a dependence from
flight -> Denver, where flight is the pinnacle and Denver is the kid or dependent. It’s represented by nmod, which stands for the nominal modifier.
This distinguishes the scenario for dependency between the two phrases, where one serves as the pinnacle and the other as the dependent. Currently, the Common Dependency V2 taxonomy consists of 37 common syntactic relationships, as shown in the table below:
|acl||clausal modifier of a noun (adnominal clause)|
|acl:relcl||relative clause modifier|
|advcl||adverbial clause modifier|
|advmod:emph||emphasizing phrase, intensifier|
|advmod:lmod||locative adverbial modifier|
|compound:lvc||gentle verb building|
|compound:prt||phrasal verb particle|
|compound:svc||serial verb compounds|
|csubj:move||clausal passive topic|
|det:numgov||рrоnоminаl quаntifier gоverning the саse оf the nоun|
|det:nummod||рrоnоminаl quаntifier agreeing with the саse оf the nоun|
|expl:move||reflexive pronoun utilized in reflexive passive|
|expl:pv||reflexive clitic with an inherently reflexive verb|
|mounted||mounted multiword expression|
|flat||flat multiword expression|
|nmod:poss||possessive nominal modifier|
|nsubj:move||passive nominal topic|
|nummod:gov||numeriс mоdifier gоverning the саse оf the nоun|
|xcomp||open clausal complement|
Dependency Parsing using NLTK
The Pure Language Toolkit (NLTK) package deal will be used for Dependency Parsing, which is a set of libraries and codes used during statistical Pure Language Processing (NLP) of human language.
We may use NLTK to do dependency parsing in one of several ways:
1. Probabilistic, projective dependency parser: These parsers predict new sentences by using human language data acquired from hand-parsed sentences. They’re known to make mistakes and work with a limited collection of coaching information.
2. Stanford parser: It is a Java-based pure language parser. You would want the Stanford CoreNLP parser to perform dependency parsing. The parser supports a number of languages, including English, Chinese, German, and Arabic.
Here’s how you should use the parser:
from nltk.parse.stanford import StanfordDependencyParser path_jar = ‘path_to/stanford-parser-full-2014-08-27/stanford-parser.jar’ path_models_jar = ‘path_to/stanford-parser-full-2014-08-27/stanford-parser-3.4.1-models.jar’ dep_parser = StanfordDependencyParser( path_to_jar = path_jar, path_to_models_jar = path_models_jar ) consequence = dep_parser.raw_parse(‘I shot an elephant in my sleep’) dependency = consequence.subsequent() checklist(dependency.triples()) The following is the output of the above program: [ ((u’shot’, u’VBD’), u’nsubj’, (u’I’, u’PRP’)), ((u’shot’, u’VBD’), u’dobj’, (u’elephant’, u’NN’)), ((u’elephant’, u’NN’), u’det’, (u’an’, u’DT’)), ((u’shot’, u’VBD’), u’prep’, (u’in’, u’IN’)), ((u’in’, u’IN’), u’pobj’, (u’sleep’, u’NN’)), ((u’sleep’, u’NN’), u’poss’, (u’my’, u’PRP$’)) ]
Constituency Parsing is based on context-free grammars. Constituency Context-free grammars are used to parse text. Right here the parse tree includes sentences that have been broken down into sub-phrases, each of which belongs to a different grammar class. A terminal node is a linguistic unit or phrase that has a mother or father node and a part-of-speech tag.
Fоr exаmрle, “A cat” and “a box beneath the bed”, are noun phrases, while “write a letter” and “drive a car” are verb phrases.
Consider the following example sentence: “I shot an elephant in my pajamas.” The constituency parse tree is shown graphically as follows:
The parse tree on the left represents catching an elephant carrying pyjamas, while the parse tree on the right represents capturing an elephant in his pyjamas.
The entire sentence is broken down into sub-phases till we’ve got terminal phrases remaining. VP stands for verb phrases, whereas NP stands for noun phrases.
Dependency Parsing vs Constituency Parsing
The Stanford parser will also be used to do constituency parsing. It begins by parsing a phrase using the constituency parser and then transforms the constituency parse tree into a dependency tree.
In case your main objective is to interrupt a sentence into sub-phrases, it is ideal to implement constituency parsing. However, dependency parsing is the best method for discovering the dependencies between phrases in a sentence.
Let’s look at an example to see what the difference is:
A constituency parse tree denotes the subdivision of a text into sub-phrases. The tree’s non-terminals are different sorts of phrases, the terminals are the sentence’s words, and the edges are unlabeled. A constituency parse for the simple statement “John sees Bill” would be:
A dependency parse links words together based on their connections. Each vertex in the tree corresponds to a word, child nodes to words that are reliant on the parent, and edges to relationships. The dependency parse for “John sees Bill” is as follows:
You should choose the parser type that is most closely related to your objective. If you’re looking for sub-phrases inside a sentence, you’re definitely interested in the constituency parse. If you’re interested in the connection between words, you’re probably interested in the dependency parse.
Organizations are seeking new methods to make use of computer technology as it advances beyond its artificial limits. A significant rise in computing speeds and capacities has resulted in the development of new and highly intelligent software systems, some of which are ready to replace or enhance human services.
One of the finest examples is the growth of natural language processing (NLP), with smart chatbots prepared to change the world of customer service and beyond.
In summary, human language is awe-inspiringly complex and diverse.
In addition to assisting in the resolution of linguistic ambiguity, NLP is significant because it offers a helpful mathematical foundation for a variety of downstream applications such as voice recognition and text analytics.
In order to understand NLP, it’s important to have a good understanding of the basics, Dependency Parsing is one of them.
I hope you find the information interesting. If you’d want to connect with me, you may do so via:
or if you have any other questions, you can also send a mail to me.
Image 1 – https://web.stanford.edu/~jurafsky/slp3/14.pdf
Image 2 – https://web.stanford.edu/~jurafsky/slp3/13.pdf
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion