Get Started

Overview

OGB contains graph datasets that are managed by data loaders. The loaders handle downloading and pre-processing of the datasets. Additionally, OGB has standardized evaluators and leaderboards to keep track of state-of-the-art results.

The OGB components are closely tied to OGB Python package, as detailed below.


Package Installation

You can install OGB using Python package manager pip.

pip install ogb

Please check the version is 1.2.3.

python -c "import ogb; print(ogb.__version__)"
# Otherwise, please update the version by running
pip install -U ogb

Requirements

  • Python 3.5
  • PyTorch>=1.2
  • DGL>=0.5.0 or torch-geometric>=1.6.0
  • Numpy>=1.16.0
  • pandas>=0.24.0
  • urllib3>=1.24.0
  • scikit-learn>=0.20.0
  • outdated>=0.2.0

Package Usage

Next, we outline two key features of the OGB package, easy-to-use data loaders, and standardized model evaluators.
Please also refer to our example code for how the package can be used in practice.

Data Loaders

We prepare easy-to-use PyTorch Geometric and DGL data loaders that handle dataset downloading and standardized dataset splits.

Following is an example in PyTorch Geometric showing that a few lines of code are sufficient to prepare and split the dataset. You can enjoy the same convenience for DGL. We also prepare library-agnostic dataset loaders that can be used with any other deep learning libraries such as Tensorflow and MxNet.

from ogb.graphproppred import PygGraphPropPredDataset
from torch_geometric.data import DataLoader

dataset = PygGraphPropPredDataset(name = "ogbg-molhiv")
 
split_idx = dataset.get_idx_split() 
train_loader = DataLoader(dataset[split_idx["train"]], batch_size=32, shuffle=True)
valid_loader = DataLoader(dataset[split_idx["valid"]], batch_size=32, shuffle=False)
test_loader = DataLoader(dataset[split_idx["test"]], batch_size=32, shuffle=False)

Mapping: The nodes/edges/graphs in OGB are mapped to real entities in the world, e.g., each drug node in the drug-drug interaction network is mapped to a unique drug ID in DrugBank. The mapping information is provided in mapping/ directory of the downloaded dataset folder, and is meant to allow researchers to draw scientific insight from model’s prediction and to potentially augment the given graphs with richer information.

Evaluators

We prepare standardized evaluators for testing and comparing different methods. The evaluator takes input_dict (a dictionary whose format is specified in evaluator.expected_input_format) as input and returns a dictionary storing performance metric, which is appropriate for a particular dataset.

from ogb.graphproppred import Evaluator

evaluator = Evaluator(name = "ogbg-molhiv")
# You can learn the input and output format specification of the evaluator as follows.
# print(evaluator.expected_input_format) 
# print(evaluator.expected_output_format) 
input_dict = {"y_true": y_true, "y_pred": y_pred}
result_dict = evaluator.eval(input_dict) # E.g., {"rocauc": 0.7321}

Citing OGB

If you use OGB datasets in your work, please cite our paper (Bibtex below).

@article{hu2020ogb,
  title={Open Graph Benchmark: Datasets for Machine Learning on Graphs},
  author={Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec},
  journal={arXiv preprint arXiv:2005.00687},
  year={2020}
}

You are ready to explore OGB datasets!