Named Entity Recognition in NLP
Named Entity Recognition is searching entities from a document or text in NLP. Here, entities are the name of an organization, place, country or cities, money or currency etc. It is used to classify entities and it’s application can be seen in optimizing search engines where we can simply search for the entities directly. This can also be seen when we tag different locations in social media for example Instagram uses this feature to categorize locations, people, organizations etc.
Here is an example of how this work can be seen.
The Named Entity Recognition API works behind the scene to identify and spot the relevant entities in this search. This speeds up the search process as all the relevant tags are stored together and highlighted.
spaCy makes it very easy to perform named entity recognition in NLP with just few lines of code.
spaCy features an extremely fast statistical entity recognition system, that assigns labels to contiguous spans of tokens. The default trained pipelines can identify a variety of named and numeric entities, including companies, locations, organizations and products. You can add arbitrary classes to the entity recognition system, and update the model with new examples.
Named Entity Recognition is implemented by the pipeline
component ner
. Most of the models have ner
in their pipeline by default and we can check for it. In case your model does not have it by default then you can add it using nlp.add_pipe()
# importing library
import spacy
nlp = spacy.load('en_core_web_sm')
nlp.pipe_names
Output:
['tagger' , 'parser' , 'ner']
This article will be an overview of how we can use spaCy for NER.
Let’s see how to use spaCy to spot NER.
import spacy
nlp = spacy.load('en_core_web_sm')s1 = "Facebook changes it's name to Meta."#creating a document
doc1 = nlp(s1)#entities in doc1
print(doc1.ents)
Output:
(Meta,)
Here, doc1.ents
is not reading ‘Facebook’ as an entity. But, spaCy enables us to add it to our entity list.
Adding Facebook to entity list:
from spacy.tokens import Spanword_ent = Span(doc1, 0, 1, label="ORG")
org_ent = list(doc1.ents)#assigning complete list of entity to doc1.ents
doc1.ents = org_ent + [word_ent]#after adding facebook to doc1.ents
print(doc1.ents)
Output:
(Facebook, Meta)
Another way to spot entities in a document is using displaCy visualizer.
s4 = "Accenture Acquires BCS Consulting to Strengthen its U.K. Financial Services Consulting and Technology Services Capabilities"doc2 = nlp(s4)from spacy import displacy
displacy.render(docs = doc2 , style ='ent', jupyter = True)
Output:
print(doc2.ents)
Output:
(U.K. Financial Services Consulting and Technology Services Capabilities,)
Here, spaCy is not spotting ‘Accenture’ and ‘BCS Consulting’ as entities. Let’s add them to our entity list.
from spacy.tokens import Span# 0 , 1 represent the position of 'Accenture' in our text
word_ent1 = Span(doc2, 0, 1, label="ORG") # 2, 4 represent the position of 'BCS consulting' in our text word_ent2 = Span(doc2, 2,4, label="ORG")org_ent = list(doc2.ents)#assigning complete list of entity to doc2.ents
doc2.ents = org_ent + [word_ent1] + [word_ent2]#after adding new entities to doc2.ents
print(doc2.ents)
Output:
(Accenture, BCS Consulting, U.K. Financial Services Consulting and Technology Services Capabilities)
Using displaCy for visualizing.
displacy.render(docs = doc2 , style = 'ent' , jupyter = True)
Output:
Let’s take another example and spot NER in a text.
s3 = "Pakistan beats India for the first time in a World Cup game on Sunday at the Dubai International Stadium"doc3 = nlp(s3)for i in doc3.ents:
print(i.text , i.label_ , str(spacy.explain(i.label_)))
Output:
Pakistan GPE Countries, cities, states
India GPE Countries, cities, states
first ORDINAL "first", "second", etc.
World Cup EVENT Named hurricanes, battles, wars, sports events, etc.
Sunday DATE Absolute or relative dates or periods
the Dubai International Stadium ORG Companies, agencies, institutions, etc.
We can see in the output that the for loop reads the entities in the text ‘s3' and returns detailed explanation of spotted entities.
Visualizing the dependency parse using displaCy.
The dependency visualizer, dep
, shows part-of-speech tags and syntactic dependencies.
displacy.render(docs = doc3 , style = 'dep',options = {'distance':50} ,jupyter = True)
Output:
Resources Used :
- EntityRecognizer · spaCy API Documentation
- spaCy’s NER model · spaCy Universe
- Visualizers · spaCy Usage Documentation
Hope this was helpful !!