Updates from OGB
Please update your package to 1.3.6 (April 6th, 2023).
June 11st, 2023: OGB-LSC text data released
- MAG240M: Download (33GB) md5sum: fc345d28b45808a2fa280ce6cbbfd198
- WikiKG90Mv2: Download (2.4GB) md5sum: 2f0e3178a5e201bd1ab59a81f2dcba72
April 6th, 2023: Package updated to package
- Pandas 2.0 compatibility (see PR).
November 2nd, 2022: Package updated to package
- Fixed stuck import bug (see PR)
August 20th, 2022: Package updated to package
ogbl-vesselis included in the OGB. Thank you Julian, Johannes, and Stephan for the contribution!
- The ranking metric of the link prediction is improved (see PR).
September 28th, 2021: Package updated to package
- Two datasets from the OGB-LSC have been updated as follows.
- WikiKG90M –> WikiKG90Mv2
- PCQM4M –> PCQM4Mv2
- New OGB-LSC webpage is available here.
September 7th, 2021: KDD Cup 2021 workshop videos are out here.
April 8th, 2021: Package updated to package
- Thanks to the DGL Team, all the LSC data is now hosted on AWS. This significantly improves the download speed around the globe! The underlying LSC data stays exactly the same.
March 15th, 2021: OGB-LSC at KDD Cup 2021 started!
- We organize the machine learning challenge on large-scale graph data.
- Please update your package to
1.3.0, through which the OGB-LSC datasets are accessible.
Feb 28th, 2021: Package updated to
- Fixed downloading bug of https (expired certificate) by switching to http.
Feb 24th, 2021: Package updated to
ogbg-codehas been deprecated due to prediction target (i.e., method name) leakage in input AST.
ogbg-code2has been introduced that fixes the issue, where the method name and its recursive definition in AST are replaced with a special token
Dec 29th, 2020: Package updated to
ogbl-wikikghave been deprecated due to a bug in negative samples in test/validation sets.
ogbl-wikikg2are introduced that fixes the issue.
Oct 13rd, 2020: Rules for the experimental protocol clarified.
Oct 11st, 2020: Call for dataset contribution.
We opened the dataset contribution from our community. If you have interesting graph datasets, we are looking forward to hearing from you (details here)!
Sep 12nd, 2020: Package updated to
ogbn-papers100Mdata loading more tractable by using compressed binary files (fix issue).
- Introduced DatasetSaver module for external contributors.
- Made dataset object compatible to DGL v0.5 (not backward compatible for heterogeneous graph datasets).
Aug 11st, 2020: Package updated to
We changed the evaluation metric of
ogbg-molpcba from PRC-AUC to Average Precision (AP). AP is shown to be more appropriate to summarize the non-convex nature of the Precision Recall Curve . The leaderboard and our paper have been updated accordingly.
 Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. InInternational Conference on Machine Learning (ICML), pp. 233–240, 2006.
July 25th, 2020: Leaderboard policy updated.
In the leaderboard submission, we additionally require reporting validation performance and tuned hyper-parameters. Our goal is to encourage the fair model selection procedure, by preventing the development of models that are over-tuned to our public test sets. Please refer to here for more details. We thank the community for the great suggestion. If you have previously made leaderboard submissions, please tell us the above two information.
June 27th, 2020: Leaderboard policy updated.
- Additional information is required for leaderboard submission (thanks to the suggestion from Google group discussion and Github issue).
- The package version requirement has been added for each dataset.
- To make sure all the leaderboard submissions use the same datasets and evaluators, we have added the package version requirement for each dataset. It can be checked at both dataset pages (e.g., here) and leaderboard pages (e.g., here).
- We highly recommend always using the newest package version. Our data loader only downloads and processes the modified datasets.
June 26th, 2020: Package updated to
- [Bug fix] The
ogbn-magdataset has been changed to exclude duplicated edges (fix issue).
- [Bug fix] The Evaluator for
ogbl-ddihas been changed to use Hits@50 and Hits@20, respectively.
- [Bug fix] The DGL data loader for the two heterogeneous graph datasets (
ogbl-biokg) is fixed (fix issue).
- Baseline performance on
ogbl-ppahas been updated.
- Arxiv paper has been updated accordingly.
June 11st, 2020: Second major release of OGB.
- 5 new datasets (
ogbg-code) and their benchmark experiments have been added.
- Our arXiv paper has been updated accordingly.
- Our package has been updated to
1.2.0that includes the new datasets. No change has been applied to the existing datasets.
- Baseline performance on
ogbl-citationhas been improved.