4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MatSciBERT: A materials domain language model for text mining and information extraction

      , , ,
      npj Computational Materials
      Springer Science and Business Media LLC

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A large amount of materials science knowledge is generated and stored as text published in peer-reviewed scientific literature. While recent developments in natural language processing, such as Bidirectional Encoder Representations from Transformers (BERT) models, provide promising information extraction tools, these models may yield suboptimal results when applied on materials domain since they are not trained in materials science specific notations and jargons. Here, we present a materials-aware language model, namely, MatSciBERT, trained on a large corpus of peer-reviewed materials science publications. We show that MatSciBERT outperforms SciBERT, a language model trained on science corpus, and establish state-of-the-art results on three downstream tasks, named entity recognition, relation classification, and abstract classification. We make the pre-trained weights of MatSciBERT publicly accessible for accelerated materials discovery and information extraction from materials science texts.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: not found
          • Article: not found

          Commentary: The Materials Project: A materials genome approach to accelerating materials innovation

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            MIMIC-III, a freely accessible critical care database

            MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Glove: Global Vectors for Word Representation

                Bookmark

                Author and article information

                Contributors
                Journal
                npj Computational Materials
                npj Comput Mater
                Springer Science and Business Media LLC
                2057-3960
                December 2022
                May 03 2022
                : 8
                : 1
                Article
                10.1038/s41524-022-00784-w
                73c88d1f-4e00-4c2e-84c4-01e2d78a5df6
                © 2022

                https://creativecommons.org/licenses/by/4.0

                https://creativecommons.org/licenses/by/4.0

                History

                Comments

                Comment on this article