NLPContributionGraph

Structuring Scholarly NLP Contributions in the Open Research Knowledge Graph


Dataset distribution under CC BY-SA 4.0

Dataset

The NLPContributionGraph shared task comprises a dataset of Natural Language Processing scholarly articles annotated for their contributions. The annotations are provided in terms of three data elements: 1) contribution sentences; 2) scientific term and predicate phrases from the sentences; and 3) (subject,predicate,object) triple statements toward KG building, wherein the triples taken together form the articles' contribution-centered knowledge graph. In order to understand the dataset and the task better, we explain the annotated data with the help of examples.


Understanding the Data

NLPContributionGraph uses two levels of knowledge systematization: 1) At the root, it has a dummy node called Contribution. And following the root node, 2) it has twelve nodes which we generically refer to as information units. Each scholarly article's annotated contribution triple statements are organized under three (mandatory) or more of these information unit nodes, depending on whether they apply to the article. These information units are defined below.


ResearchProblem:   It determines the research challenge addressed by a contribution using the predicate hasResearchProblem. By definition, it is the focus of the research investigation, in other words, the issue for which the solution must be obtained.
Approach or Model:   Essentially, this is the contribution of the paper as the solution proposed for the research problem.
Code:   It is the link to the software on an opensource hosting platform such as Gitlab or Github or on the author's website.
Dataset:   This is another aspect of the contribution solution in the form of a dataset.
ExperimentalSetup or Hyperparameters:   Includes details about the platform including both hardware (e.g., GPU) and software (e.g., Tensorflow library) for implementing the machine learning solution; and of variables, that determine the network structure (e.g., number of hidden units) and how the network is trained (e.g., learning rate), for tuning the software to the task objective. It is called ExperimentalSetup when hardware details are provided, otherwise Hyperparameters.
Baselines:   They are the listed systems that a proposed Approach or Model is compared against.
Results:   The main findings or outcomes reported in the article text for the ResearchProblem.
Tasks:   The Approach or Model, particularly in multi-task settings, are tested on more than one task, in which case, we list all the experimental tasks. The experimental tasks are often synonymous with the experimental datasets since it is common in NLP for tasks to be defined over datasets. And where lists of Tasks are concerned, the Tasks can include the ExperimentalSetup as a sub information unit.
Experiments:   It is a container information unit that includes one or more of the previous discussed units as sub information units. Can be combination of lists of Tasks, ExperimentalSetup and Results, or a combination of Approach, ExperimentalSetup and Results.
AblationAnalysis:   It is a form of Results that describes the performance of components in an Approach or Model.


Below, we provide five examples of different types of information units supported with an explanation of the annotated data.

Example #1: ResearchProblem

In this example, the ResearchProblem information unit is modeled. The reference paper is Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. For this information unit, the subject node is the root node Contribution which is left unspecified. It is then followed by the predicate "has research problem." Following this, the three phrases which were found to represent the research problem are annotated. Further, the sentences from where the phrases are taken are also annotated with predicate "from sentence."

{
  "has research problem" : [
    ["Statistical Machine Translation", {"from sentence": "Learning Phrase Representations using RNN Encoder - Decoder for Statistical Machine Translation"}],
    ["SMT", {"from sentence" : "Along this line of research on using neural networks for SMT , this paper focuses on a novel neural network architecture that can be used as apart of the conventional phrase - based SMT system ."}],
    ["phrase - based SMT", {"from sentence" : "Along this line of research on using neural networks for SMT , this paper focuses on a novel neural network architecture that can be used as apart of the conventional phrase - based SMT system ."}]
  ]
}
				

The annotations as triples are:

(Contribution,has research problem,Statistical Machine Translation)
(Contribution,has research problem,SMT)
(Contribution,has research problem,phrase - based SMT)
				

In the context of the given data, the shared task entails:
1) identifying sentences with the research problem [for the expected evaluation output format, see the files named sentences.txt per article in the trial data],
2) extracting the precise research problem phrase spans from the sentences selected in the first step [for the expected evaluation output format, see the files named entities.txt per article in the trial data], and
3) forming the triples which is relatively straightforward for this information unit [for the expected evaluation output, see the files in the triples directories per article in the trial data].

Example #2: AblationAnalysis

In this example, the AblationAnalysis information unit is modeled. The reference paper is Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. The AblationAnalysis information unit is connected to Contribution via the predicate "has". Note, unlike the previous example that had a single evidence sentence provided against each annotated object, in this example, the evidence sentences are annotated as a paragraph. This is to avoid introducing more nested dictionaries or arrays in JSON format. Consider, after the top-level information unit node AblationAnalysis, the annotations are all nested under entity "TACRED dev set," which can be interpreted as ablation experiment results that were performed on this dataset. Owing to this nesting, to avoid complicated JSON format, the evidence sentences for the sentence-based annotations are not singled out on a per-sentence basis close to the annotated phrase node, but are written together as a paragraph.

{
  "has" : {
    "Ablation analysis" : {
      "on" : {
        "TACRED dev set" : {
          "find" : {
            "entity representations and feedforward layers" : {
              "contribute" : "1.0 F 1"
            },
            "from sentence" : "To study the contribution of each component in the C - GCN model , we ran an ablation study on the TACRED dev set ) .
We find that : The entity representations and feedforward layers contribute 1.0 F 1 ."

          },
          "remove" : {
            "dependency structure" : {
              "score drops by" : "3.2 F 1",
              "from sentence" : "( 2 ) When we remove the dependency structure ( i.e. , setting to I ) , the score drops by 3.2 F 1 ."
            },
            "feedforward layers , the LSTM component and the dependency structure altogether" : {
              "F 1 drops by" : "10.3",
              "from sentence" : "( 3 ) F 1 drops by 10.3 when we remove the feedforward layers , the LSTM component and the dependency structure altogether ."  
            }
          },
          "Removing" : {
            "pruning" : {
              "using" : "full trees as input"
              "hurts the result by" : "another 9.7 F1"
            },
            "from sentence" : "( 4 ) Removing the pruning ( i.e. , using full trees as input ) further hurts the result by another 9.7 F 1 ."
          }
        }
      }
    }
  }
}
				

From the above annotations, the following twelve triples are obtained. These annotations comprise three levels of nested data overall.

(Contribution, has, Ablation Analysis)
  (Ablation analysis, on, TACRED dev set)
    (TACRED dev set, find, entity representations and feedforward layers)
      (entity representations and feedforward layers, contribute, 1.0 F 1)
    (TACRED dev set, Removing, pruning)
      (pruning, using, full trees as input)
      (pruning, hurts the result by, another 9.7 F1)
    (TACRED dev set, remove, dependency structure)
      (dependency structure, score drops by, 3.2 F 1)
    (TACRED dev set, remove, feedforward layers , the LSTM component and the dependency structure altogether)
      (feedforward layers , the LSTM component and the dependency structure altogether, F 1 drops by, 10.3)
				

Example #3: Results

In this example, the Results information unit is modeled. The reference paper is On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis.

{
  "has" : {
    "Results" : {
      "has" : {
        "Experiment 1 : Preprocessing effect" : {
          "use of" : {
            "more complex preprocessing techniques" : {
              "such as" : {
                "lemmatization and multiword grouping" : {
                  "does not" : "help"
                }
              }
            },
            "from sentence" : "Nevertheless , the use of more complex preprocessing techniques such as lemmatization and multiword grouping does not help in general ."            
          }
        },
        "Experiment 2 : Cross-preprocessing" : {
          "observe" : {
            "different trend , with multiwordenhanced vectors" : {
              "exhibiting" : {
                "better performance" : {
                  "both on" : [{"single CNN model" : {"best overall performance in" : "seven of the nine datasets"}}, {"CNN + LSTM model" : {"best performance in" : "four datasets", "same ballpark as" : {"best results" : {"in four of" : "remaining five datasets"}}}}]
                }
              }
            },
            "from sentence" : "In this experiment we observe a different trend , with multiwordenhanced vectors exhibiting a better performance both on the single CNN model ( best overall performance in seven of the nine datasets ) and on the CNN + LSTM model ( best performance in four datasets and in the same ballpark as the best results in four of the remaining five datasets ) ."            
          },
          "using" : {
            "multiword - wise embeddings" : {
              "on" : {
                "vanilla setting" : {
                  "leads to" : {
                    "consistently better results" : {
                      "than using them on" : {
                        "same multiwordgrouped preprocessed dataset" : {
                          "in" : "eight of the nine datasets"
                        }  
                      }
                    }
                  }
                }
              },
              "from sentence" : "Interestingly , using multiword - wise embeddings on the vanilla setting leads to consistently better results than using them on the same multiwordgrouped preprocessed dataset in eight of the nine datasets ."
            }
          },
          "use of" : {
            "embeddings" : {
              "trained on" : "simple tokenized corpus ( i.e. vanilla )"
            },
            "from sentence" : "Apart from this somewhat surprising finding , the use of the embeddings trained on a simple tokenized corpus ( i.e. vanilla ) proved again competitive , as different preprocessing techniques such as lowercasing and lemmatizing do not seem to help ."            
          }
        }
      }
    }
  }
}
				

From the above annotations, the following 21 triples are obtained. These annotations comprise six levels of nested data overall.

(Contribution, has, Results)
  (Results, has, Experiment 2 : Cross-preprocessing)
    (Experiment 2 : Cross-preprocessing, using, multiword - wise embeddings)
      (multiword - wise embeddings, on, vanilla setting)
        (vanilla setting, leads to, consistently better results)
          (consistently better results, than using them on, same multiwordgrouped preprocessed dataset)
            (same multiwordgrouped preprocessed dataset, in, eight of the nine datasets)
    (Experiment 2 : Cross-preprocessing, use of, embeddings)
      (embeddings, trained on, simple tokenized corpus ( i.e. vanilla ))
    (Experiment 2 : Cross-preprocessing, observe, different trend , with multiwordenhanced vectors)
      (different trend , with multiwordenhanced vectors, exhibiting, better performance)
        (better performance, both on, single CNN model)
          (single CNN model, best overall performance in, seven of the nine datasets)
        (better performance, both on, CNN + LSTM model)
          (CNN + LSTM model, best performance in, four datasets)
          (CNN + LSTM model, same ballpark as, best results)
            (best results, in four of, remaining five datasets)
  (Results, has, Experiment 1 : Preprocessing effect)
    (Experiment 1 : Preprocessing effect, use of, more complex preprocessing techniques)
      (more complex preprocessing techniques, such as, lemmatization and multiword grouping)
        (lemmatization and multiword grouping, does not, help)
				

Example #4: ExperimentalSetup

In this example, the ExperimentalSetup information unit is modeled. The reference paper is Neural Machine Translation by Jointly Learning to Align and Translate.

{
  "has" : {
    "Hyperparameters" : {
      "train" : {
        "two types of models" : {
          "first one" : "RNN Encoder - Decoder",
          "other" : "RNNsearch",
          "from sentence" : "We train two types of models .
The first one is an RNN Encoder - Decoder ( RNNencdec , , and the other is the proposed model , to which we refer as RNNsearch ."

        },
        "each model twice" : {
          "with" : {
            "sentences" : {
              "of length" : ["up to 30 words", "up to 50 word"]
            }
          },
          "from sentence" : "We train each model twice : first with the sentences of length up to 30 words ( RNNencdec - 30 , RNNsearch - 30 ) and then with the sentences of length up to 50 word ( RNNencdec - 50 , RNNsearch - 50 ) ."
        }
      },
      "of" : {
        "RNNencdec" : {
          "encoder and decoder" : {
            "hidden units each" : {
              "have" : "1000"
            }
          }
        },
        "from sentence" : "The encoder and decoder of the RNNencdec have 1000 hidden units each ."
      }
      "encoder of" : {
        "RNNsearch" : {
          "consists of" : {
            "forward and backward recurrent neural networks ( RNN )" : {
              "each having" : "1000 hidden units"
            }
          }
        },
        "from sentence" : "The encoder of the RNNsearch consists of forward and backward recurrent neural networks ( RNN ) each having 1000 hidden units ."        
      },
      "has" : {
        "decoder" : {
          "hidden units" : "1000"
        },
        "from sentence" : "It s decoder has 1000 hidden units ."
      },
      "use" : {
        "multilayer network" : {
          "with" : "single maxout hidden layer",
          "to compute" : {
            "conditional probability" : {
              "of" : "each target word"
            }
          }
        },
        "from sentence" : "In both cases , we use a multilayer network with a single maxout hidden layer to compute the conditional probability of each target word ."
      }
    }
  }
}
				

From the above annotations, the following 20 triples are obtained. These annotations comprise three levels of nested data overall.

(Contribution, has, Hyperparameters)
  (Hyperparameters, use, multilayer network)
    (multilayer network, with, single maxout hidden layer)
    (multilayer network, to compute, conditional probability)
      (conditional probability, of, each target word)
  (Hyperparameters, of, RNNencdec)
    (RNNencdec, encoder and decoder, hidden units each)
      (hidden units each, have, 1000)
  (Hyperparameters, encoder of, RNNsearch)
    (RNNsearch, consists of, forward and backward recurrent neural networks ( RNN ))
      (forward and backward recurrent neural networks ( RNN ), each having, 1000 hidden units)
  (Hyperparameters, has, decoder)
    (decoder, hidden units, 1000)
  (Hyperparameters, train, each model twice)
    (each model twice, with, sentences)
      (sentences, of length, up to 30 words)
      (sentences, of length, up to 50 word)
  (Hyperparameters, train, two types of models)
    (two types of models, first one, RNN Encoder - Decoder)
    (two types of models, other, RNNsearch)
				

Example #5: Model

In this example, the Model information unit that essentially reflects the core of the article's contribution is modeled. The reference paper is Convolutional Neural Network Architectures for Matching Natural Language Sentences.

{
  "has" : {
    "Model" : {
      "propose" : {
        "deep neural network models" : {
          "adapt" : {
            "convolutional strategy" : {
              "to" : "natural language"
            }
          }
        },
        "from sentence" : "Towards this end , we propose deep neural network models , which adapt the convolutional strategy ( proven successful on image and speech ) to natural language ."
      },
      "To further explore" : {
        "relation" : {
          "between" : {
            "representing sentences and matching them" : {
              "devise" : {
                "novel model" : {
                  "with" : {
                    "same convolutional architecture" : {
                      "can naturally host" : ["hierarchical composition for sentences", "simple - to - comprehensive fusion of matching patterns"]
                    }
                  }
                }
              }
            }
          }
        },
        "from sentence" : "To further explore the relation between representing sentences and matching them , we devise a novel model that can naturally host both the hierarchical composition for sentences and the simple - to - comprehensive fusion of matching patterns with the same convolutional architecture ."            
      }
    }
  }
}
				

From the above annotations, the following 10 triples are obtained. These annotations comprise five levels of nested data overall.

					
(Contribution, has, Model)
  (Model, To further explore, relation)
    (relation, between, representing sentences and matching them)
      (representing sentences and matching them, devise, novel model)
        (novel model, with, same convolutional architecture)
          (same convolutional architecture, can naturally host, hierarchical composition for sentences)
          (same convolutional architecture, can naturally host, simple - to - comprehensive fusion of matching patterns)
  (Model, propose, deep neural network models)
    (deep neural network models, adapt, convolutional strategy)
      (convolutional strategy, to, natural language)
				

NLPContributionGraph Training, Development, and Evaluation Data

NLPContributionGraph will be organized in three Evaluation Phases. Please refer to our Codalab competition website for more details. The Training and Development datasets will remain unchanged across all evaluation phases. However, the test input data will change depending on the evaluation phase in question. Further, the expected system output for automatic evaluation on Codalab will also change based on the evaluation phase.

For reference Training and Development input samples, please download our Trial Data Release. For the Evaluation Phase data samples, please refer to our Github site at https://github.com/ncg-task/sample-submission.

Contact

Feel free to ask any clarification questions on our Google Groups at https://groups.google.com/forum/#!forum/ncg-task-semeval-2021.