Prashant Sharma — December 2, 2021
Advanced NLP

This article was published as a part of the Data Science Blogathon

Introduction

Pure Language Processing is an interdisciplinary concept that uses the fundamentals of computational linguistics and Synthetic Intelligence to understand how human languages interact with technology.

To apply NLP to real-world scenarios, it is necessary to have a thorough grasp of various terminology and ideas. Among which some of the important concepts are Half-of-Speech (POS) Tagging, Statistical Language Modeling, Syntactic, Semantic, and Sentiment Evaluation, Normalization, Tokenization, Dependency Parsing, and Constituency Parsing.

We will examine the principles of Dependency Parsing in this article in order to have a better understanding of how it is applied in Natural Language Processing.

Dependency Parsing

The term Dependency Parsing (DP) refers to the process of examining the dependencies between the phrases of a sentence in order to determine its grammatical structure. A sentence is divided into many sections based mostly on this. The process is based on the assumption that there is a direct relationship between each linguistic unit in a sentence. These hyperlinks are called dependencies.

Consider the following statement: “I prefer the morning flight through Denver.”

The diagram below explains the sentence’s dependence structure:

Dependency Parsing in NLP
IMAGE – 1

In a written dependency structure, the relationships between each linguistic unit, or phrase, in the sentence are expressed by directed arcs. The root of the tree “prefer” varies the pinnacle of the preceding sentence, as labelled within the illustration.

A dependence tag indicates the relationship between two phrases. For example, the word “flight” changes the meaning of the noun “Denver.” As a result, you may identify a dependence from

flight -> Denver, where flight is the pinnacle and Denver is the kid or dependent. It’s represented by nmod, which stands for the nominal modifier.

This distinguishes the scenario for dependency between the two phrases, where one serves as the pinnacle and the other as the dependent. Currently, the Common Dependency V2 taxonomy consists of 37 common syntactic relationships, as shown in the table below:

Dependency Tag Description
acl clausal modifier of a noun (adnominal clause)
acl:relcl relative clause modifier
advcl adverbial clause modifier
advmod adverbial modifier
advmod:emph emphasizing phrase, intensifier
advmod:lmod locative adverbial modifier
amod adjectival modifier
appos appositional modifier
aux auxiliary
aux:move passive auxiliary
case case-marking
cc coordinating conjunction
cc:preconj preconjunct
ccomp clausal complement
clf classifier
compound compound
compound:lvc gentle verb building
compound:prt phrasal verb particle
compound:redup reduplicated compounds
compound:svc serial verb compounds
conj conjunct
cop copula
csubj clausal topic
csubj:move clausal passive topic
dep unspecified dependency
det determiner
det:numgov рrоnоminаl quаntifier gоverning the саse оf the nоun
det:nummod рrоnоminаl quаntifier agreeing with the саse оf the nоun
det:poss possessive determiner
discourse discourse ingredient
dislocated dislocated parts
expl expletive
expl:impers impersonal expletive
expl:move reflexive pronoun utilized in reflexive passive
expl:pv reflexive clitic with an inherently reflexive verb
mounted mounted multiword expression
flat flat multiword expression
flat:overseas overseas phrases
flat:title names
goeswith goes with
iobj oblique object
checklist checklist
mark marker
nmod nominal modifier
nmod:poss possessive nominal modifier
nmod:tmod temporal modifier
nsubj nominal topic
nsubj:move passive nominal topic
nummod numeric modifier
nummod:gov numeriс mоdifier gоverning the саse оf the nоun
obj object
obl indirect nominal
obl:agent agent modifier
obl:arg indirect argument
obl:lmod locative modifier
obl:tmod temporal modifier
orphan orphan
parataxis parataxis
punct punctuation
reparandum overridden disfluency
root root
vocative vocative
xcomp open clausal complement

 

Dependency Parsing using NLTK

The Pure Language Toolkit (NLTK) package deal will be used for Dependency Parsing, which is a set of libraries and codes used during statistical Pure Language Processing (NLP) of human language.

We may use NLTK to do dependency parsing in one of several ways:

1. Probabilistic, projective dependency parser: These parsers predict new sentences by using human language data acquired from hand-parsed sentences. They’re known to make mistakes and work with a limited collection of coaching information.

2. Stanford parser: It is a Java-based pure language parser. You would want the Stanford CoreNLP parser to perform dependency parsing. The parser supports a number of languages, including English, Chinese, German, and Arabic.

Here’s how you should use the parser:

from nltk.parse.stanford import StanfordDependencyParser
path_jar = ‘path_to/stanford-parser-full-2014-08-27/stanford-parser.jar’
path_models_jar = ‘path_to/stanford-parser-full-2014-08-27/stanford-parser-3.4.1-models.jar’
dep_parser = StanfordDependencyParser(
   path_to_jar = path_jar, path_to_models_jar = path_models_jar
)
consequence = dep_parser.raw_parse(‘I shot an elephant in my sleep’)
dependency = consequence.subsequent()
checklist(dependency.triples())
The following is the output of the above program:
[
   ((u’shot’, u’VBD’), u’nsubj’, (u’I’, u’PRP’)),
   ((u’shot’, u’VBD’), u’dobj’, (u’elephant’, u’NN’)),
   ((u’elephant’, u’NN’), u’det’, (u’an’, u’DT’)),
   ((u’shot’, u’VBD’), u’prep’, (u’in’, u’IN’)),
   ((u’in’, u’IN’), u’pobj’, (u’sleep’, u’NN’)),
   ((u’sleep’, u’NN’), u’poss’, (u’my’, u’PRP$’))
]

Constituency Parsing

Constituency Parsing is based on context-free grammars. Constituency Context-free grammars are used to parse text. Right here the parse tree includes sentences that have been broken down into sub-phrases, each of which belongs to a different grammar class. A terminal node is a linguistic unit or phrase that has a mother or father node and a part-of-speech tag.

Fоr exаmрle, “A cat” and “a box beneath the bed”, are noun phrases, while “write a letter” and “drive a car” are verb phrases.

Consider the following example sentence: “I shot an elephant in my pajamas.” The constituency parse tree is shown graphically as follows:

Dependency Parsing
IMAGE – 2

The parse tree on the left represents catching an elephant carrying pyjamas, while the parse tree on the right represents capturing an elephant in his pyjamas.

The entire sentence is broken down into sub-phases till we’ve got terminal phrases remaining. VP stands for verb phrases, whereas NP stands for noun phrases.

Dependency Parsing vs Constituency Parsing

The Stanford parser will also be used to do constituency parsing. It begins by parsing a phrase using the constituency parser and then transforms the constituency parse tree into a dependency tree.

In case your main objective is to interrupt a sentence into sub-phrases, it is ideal to implement constituency parsing. However, dependency parsing is the best method for discovering the dependencies between phrases in a sentence.

Let’s look at an example to see what the difference is:

A constituency parse tree denotes the subdivision of a text into sub-phrases. The tree’s non-terminals are different sorts of phrases, the terminals are the sentence’s words, and the edges are unlabeled. A constituency parse for the simple statement “John sees Bill” would be:

Dependency Parsing vs Constituency Parsing

A dependency parse links words together based on their connections. Each vertex in the tree corresponds to a word, child nodes to words that are reliant on the parent, and edges to relationships. The dependency parse for “John sees Bill” is as follows:

Flowchart

You should choose the parser type that is most closely related to your objective. If you’re looking for sub-phrases inside a sentence, you’re definitely interested in the constituency parse. If you’re interested in the connection between words, you’re probably interested in the dependency parse.

Conclusion

Organizations are seeking new methods to make use of computer technology as it advances beyond its artificial limits. A significant rise in computing speeds and capacities has resulted in the development of new and highly intelligent software systems, some of which are ready to replace or enhance human services.

One of the finest examples is the growth of natural language processing (NLP), with smart chatbots prepared to change the world of customer service and beyond.

In summary, human language is awe-inspiringly complex and diverse.

In addition to assisting in the resolution of linguistic ambiguity, NLP is significant because it offers a helpful mathematical foundation for a variety of downstream applications such as voice recognition and text analytics.

In order to understand NLP, it’s important to have a good understanding of the basics, Dependency Parsing is one of them.

I hope you find the information interesting. If you’d want to connect with me, you may do so via:

Linkedin

or if you have any other questions, you can also send a mail to me.

Image Source

Image 1 – https://web.stanford.edu/~jurafsky/slp3/14.pdf

Image 2 – https://web.stanford.edu/~jurafsky/slp3/13.pdf

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion

About the Author

Prashant Sharma

Currently, I Am pursuing my Bachelors of Technology( B.Tech) from Vellore Institute of Technology. I am very enthusiastic about programming and its real applications including software development, machine learning and data science.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *