Keyword extraction and summarization from unstructured text: A case study with open data from legal domain

Information Extraction (IE) is an important and crucial task in the world of web and open data. IE is achieved using Natural language Processing (NLP). There are various techniques used for extraction of information, however coming up with useful and meaningful information is the most important task. Many search engines rely heavily on IE. This paper focuses on entity extraction of named entities from natural language and converting them into knowledge graph of triples. The goal is to answer two types of queries (i) Keyword search that returns exact information; (ii) Summarization of a keyword in question. A case study using open data from legal domain is presented.

Content

Author and article information

Contributors

Varun Singh

Srividya Bansal

Conference

Publication date (Print): July 2022

Pages: 1-6

Affiliations

[0001]School of Computing and Augmented Intelligence

Arizona State University

Mesa, Arizona

Article

DOI: 10.14236/ewic/ODAK22.9

SO-VID: 0ef180a5-b741-4570-ac53-ac42ba5f33b1

License:

This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Conference name: Proceedings of the Symposium on Open Data and Knowledge for a Post-Pandemic Era ODAK22, UK

Conference acronym: ODAK 2022

Conference number:

Conference location: Brighton, UK

Conference date: June 30-July 1, 2022

Conference sponsor: Electronic Workshops in Computing (eWiC)

Conference theme: Open Data and Knowledge for a Post-Pandemic Era

History

Product

1477-9358 BCS Learning & Development

Self URI (article page): https://www.scienceopen.com/hosted-document?doi=10.14236/ewic/ODAK22.9

Self URI (journal page): https://ewic.bcs.org/

REFERENCES

Ronald Smith Djomkam Yotedje "GIET: Generic Information Extraction using Triple Store Databases." INFORMATIK 2015 2015
Raghu AnantharangacharSrinivasan RamaniS Rajagopalan "Ontology guided information extraction from unstructured text." arXiv preprint arXiv:1302.1335 2013
Sarthak TiwariBharat GoelSrividya Bansal "Mold-a framework for entity extraction and summarization." 2020 IEEE 14th International Conference on Semantic Computing (ICSC) IEEE2020
Andreas ThalhammerAchim Rettinger ”PageRank on Wikipedia: towards general importance scores for entities.” In International Semantic Web Conference 227 240 Springer Cham 2016
Z HongR TchouaK Chard & I Foster 2020 June 15 SciNER: Extracting Named Entities From Scientific Literature PubMed Central (PMC)
Anne-Marie VercoustreJames A ThomJovan Pehcevski "Entity ranking in Wikipedia." In Proceedings of the 2008 ACM symposium on Applied computing 1101 1106 ACM 2008
Daniel JurafskyJames H Martin Information Extraction 21st Oct, 2021 Available: https://web.stanford.edu/~jurafsky/slp3/17.pdf
Ralph Weischedel OntoNotes Release 5.0 LDC2013T19 Web Download Philadelphia Linguistic Data Consortium 2013
Lawsuits against companies dataset Nov ‘21 Available: https://www.businesshumanrights.org/en/latestnews/?&content_types=lawsuits&language=en
Peter Oram "WordNet: An electronic lexical database Christiane Fellbaum Cambridge, MA MIT Press 1998 423 Applied Psycholinguistics 22.1 (2001) 131 134
Jeffrey PenningtonRichard SocherChristopher D Manning "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) 2014
Jinho D Choi "Optimization of natural language processing components for robustness and scalability." PhD diss., University of Colorado at Boulder, 2012
Guido BoellaLuigi Di CaroLivio Robaldo "Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines." Intl. Workshop on Rules and Rule Markup Languages for the Semantic Web Springer Berlin, Heidelberg 2013
Farhad AbediniFariborz MahmoudiAmir Hossein Jadidinejad "From text to knowledge: Semantic entity extraction using yago ontology." International Journal of Machine Learning and Computing 1.2 2011 113
Alain AugerCarolinère Barri "Patternbased approaches to semantic relation extraction: A state-of-the-art." Terminology 14.1 2008 1
Anderson Rossanez "KGen: a knowledge graph generator from biomedical scientific literature." BMC medical informatics and decision making 20.4 2020 1 24

Comments

Comment on this article

[1] Ronald Smith Djomkam Yotedje "GIET: Generic Information Extraction using Triple Store Databases." INFORMATIK 2015 2015

[2] Raghu AnantharangacharSrinivasan RamaniS Rajagopalan "Ontology guided information extraction from unstructured text." arXiv preprint arXiv:1302.1335 2013

[3] Sarthak TiwariBharat GoelSrividya Bansal "Mold-a framework for entity extraction and summarization." 2020 IEEE 14th International Conference on Semantic Computing (ICSC) IEEE2020

[4] Andreas ThalhammerAchim Rettinger ”PageRank on Wikipedia: towards general importance scores for entities.” In International Semantic Web Conference 227 240 Springer Cham 2016

[5] Z HongR TchouaK Chard & I Foster 2020 June 15 SciNER: Extracting Named Entities From Scientific Literature PubMed Central (PMC)

[6] Anne-Marie VercoustreJames A ThomJovan Pehcevski "Entity ranking in Wikipedia." In Proceedings of the 2008 ACM symposium on Applied computing 1101 1106 ACM 2008

[7] Daniel JurafskyJames H Martin Information Extraction 21st Oct, 2021 Available: https://web.stanford.edu/~jurafsky/slp3/17.pdf

[8] Ralph Weischedel OntoNotes Release 5.0 LDC2013T19 Web Download Philadelphia Linguistic Data Consortium 2013

[9] Lawsuits against companies dataset Nov ‘21 Available: https://www.businesshumanrights.org/en/latestnews/?&content_types=lawsuits&language=en

[10] Peter Oram "WordNet: An electronic lexical database Christiane Fellbaum Cambridge, MA MIT Press 1998 423 Applied Psycholinguistics 22.1 (2001) 131 134

[11] Jeffrey PenningtonRichard SocherChristopher D Manning "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) 2014

[12] Jinho D Choi "Optimization of natural language processing components for robustness and scalability." PhD diss., University of Colorado at Boulder, 2012

[13] Guido BoellaLuigi Di CaroLivio Robaldo "Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines." Intl. Workshop on Rules and Rule Markup Languages for the Semantic Web Springer Berlin, Heidelberg 2013

[14] Farhad AbediniFariborz MahmoudiAmir Hossein Jadidinejad "From text to knowledge: Semantic entity extraction using yago ontology." International Journal of Machine Learning and Computing 1.2 2011 113

[15] Alain AugerCarolinère Barri "Patternbased approaches to semantic relation extraction: A state-of-the-art." Terminology 14.1 2008 1

[16] Anderson Rossanez "KGen: a knowledge graph generator from biomedical scientific literature." BMC medical informatics and decision making 20.4 2020 1 24

Celebrating 65 years of The Computer Journal - free-to-read perspectives - bcs.org/tcj65