KDD Cup 2021 has been concluded. This is an archive page.
Updated page is here. Competition results announced here.
Learn about competition results and winning solutions
- Awardees
- Final test submission: Evaluated on the entire test data
- Initial test submission: Evaluated on 5% of test data (same across submissions)
Awardees of MAG240M-LSC Track (Leaderboard)
Winners
1st place: BD-PGL (contact)
- Team members: Yunsheng Shi (Baidu), Zhengjie Huang (Baidu), Weibin Li (Baidu), Weiyue Su (Baidu), Shikun Feng (Baidu)
- Method: R-UNIMP
- Short summary: We adopt the recent advanced technique UniMP which proposes to incorporate feature and label propagation at both training and inference time, making significant improvements across several node classification tasks. And we modify it into an R-UniMP version for a heterogeneous graph with “R” stands for “Relational”. Besides, we provide a detailed recall of our key strategies and valuable findings during the en- tire competition. 30 models with different initialization are ensembled.
- Learn more: Technical report, code
- Test accuracy: 0.7549
2nd place: Academic (contact)
- Team members: Petar Velickovic (DeepMind), Peter Battaglia (DeepMind), Jonathan Godwin (DeepMind), Alvaro Sanchez (DeepMind), David Budden (DeepMind), Shantanu Thakoor (DeepMind), Jacklynn Stott (DeepMind), Ravichandra Addanki (DeepMind), Thomas Keck (DeepMind), Andreea Deac (DeepMind)
- Method: MPNN Ensemble with BGRL fine-tuning
- Short summary: MPNN over subsampled patches, fine-tuned by BGRL (bootstrapped graph latents) self-supervised objective. 20 models with different initialisation and validation split are ensembled.
- Learn more: Technical report, code
- Test accuracy: 0.7519
3rd place: Synerise AI (contact)
- Team members: Michal Daniluk (Synerise), Jacek Dabrowski (Synerise), Konrad Goluchowski (Synerise), Barbara Rychalska (Warsaw University of Technology/Synerise)
- Method: Cleora + EMDE
- Short summary: We tackle the task with an efficient model based on our previously introduced algorithms: EMDE and Cleora, on top of a simplistic feed-forward neural network. We use EMDE to represent nodes in the form of sketches – structures representing local similarity, which additionally allow for easy accumulation of multiple object values. We use Cleora for label propagation, i.e. representing nodes with sets of labels observed in the training data. To achieve maximal performance, we train 60 independent ensemble models.
- Learn more: Technical report, code
- Test accuracy: 0.7460
Runner-ups
4th place: Topology_mag (contact)
- Team members: Qiuying Peng (OPPO Research), Wencai Cao (OPPO Research), Zheng Pan (OPPO Research)
- Method: MPLP + finetune (40 ensemble)
- Short summary: Metapath-based Label Propagation with fine-tuning using latest year samples. models from 5-fold cross-validation in 8 random seeds are ensembled.
- Learn more: Technical report, code
- Test accuracy: 0.7447
5th place: passages (contact)
- Team members: Bole Ai (Nanjing University), Xiang Long (Beijing University of Posts and Telecommunications), Kaiyuan Li (Beijing University of Posts and Telecommunications), Quan Lin (Huazhong University of Science and Technology), Xiaofan Liu (Beijing University of Posts and Telecommunications), Pengfei Wang (Beijing University of Posts and Telecommunications), Mingdao Wang (Beijing University of Posts and Telecommunications), Zhichao Feng (Beijing University of Posts and Telecommunications), Kun Zhao (Nanjing University)
- Method: SGC + R-GAT + Finetune
- Short summary: Our method can be largely summarized in two main stages: a pretraining stage is designed to explore heterogeneous academic networks for better node embeddings; and a transfer learning stage is used to alleviate differences of label distributions and node representations between training and test set.
- Learn more: Technical report, code
- Test accuracy: 0.7381
6th place: DeeperBiggerBetter (contact)
- Team members: Guohao Li (KAUST), Hesham Mostafa (Intel Corporation), Jesus Alejandro Zarzar Torano (KAUST), Sami Abu-El-Haija (USC), Marcel Nassar (Intel Labs), Daniel Cummings (Intel Corporation), Sohil Shah (Intel Corporation), Matthias Mueller (Intel Labs), Bernard Ghanem (KAUST)
- Method: GNN180M
- Short summary: We train two R-GAT models, one with 2 layers and another with 3 layers for a total of 180M parameters. We utilize author labels as extra regularization, conduct multiple inference passes with proportional neighborhood sizes, aggregate their results by ensembling and then apply a label smoothing trick on model’s predictions with author labels for post-processing.
- Learn more: Technical report, code
- Test accuracy: 0.7353
Awardees of WikiKG90M-LSC Track (Leaderboard)
Winners
1st place: BD-PGL (contact)
- Team members: Weiyue Su (Baidu), Shikun Feng (Baidu), Zeyang Fang (Baidu), Huijuan Wang (Baidu), Siming Dai (Baidu), Hui Zhong (Baidu), Yunsheng Shi (Baidu), Zhengjie Huang (Baidu)
- Method: NOTE + Feature
- Short summary: We modified OTE into NOTE for better performance and use the post-smoothing technique to capture the graph structure for supplementation. Feature engineering further improves the results.
- Learn more: Technical report, code
- Test MRR: 0.9727
- Note: The solution explictly makes use of val/test tail candidate entities provided by the organizer. In practice, those candidates are not provided, and a model needs to rank among all entities.
2nd place: OhMyGod (contact)
- Team members: Weihua Peng (Harbin Institute of Technology)
- Method: TransE, CompIEx, DistMult, and SimplE (9 ensemble)
- Short summary: a) more powerful representation vector learning, b) the complementarity between different models, c) candidate filtering based on the aggregated statistics of the validation/test tail candidates.
- Learn more: Technical report, code
- Test MRR: 0.9712
- Note: The solution explictly makes use of val/test tail candidate entities provided by the organizer. In practice, those candidates are not provided, and a model needs to rank among all entities.
3rd place: GraphMIRAcles (contact)
- Team members: Jianyu Cai (University of Science and Technology of China), Jiajun Chen (University of Science and Technology of China), Taoxing Pan (University of Science and Technology of China), Zhanqiu Zhang (University of Science and Technology of China), Jie Wang (University of Science and Technology of China)
- Method: ComplEx-CMRC + Rule + KD (15 ensemble)
- Short summary: Encoder: Concat-MLP with Residual Connection (CMRC). Decoder: ComplEx. Rule mining for data augmentation. Knowledge distillation to improve single models. 15 models with different random seeds are ensembled.
- Learn more: Technical report, code
- Test MRR: 0.9707
- Note: The solution explictly makes use of val/test tail candidate entities provided by the organizer. In practice, those candidates are not provided, and a model needs to rank among all entities.
Runner-ups
4th place: littleant (contact)
- Team members: Shuo Yang (Ant Group), Daixin Wang (Ant Group), Dingyuan Zhu (Ant Group), Yakun Wang (Ant Group), Borui Ye (Ant Group)
- Method: AntLinkPred
- Short summary: 1. Heuristic features extraction. 2. Decoder (TransE, AutoSF, PariRE, RotatE,. etc) selection and model ensemble. 3. Model fine-tuning with low-confidence samples. 4. Generate a smaller candidates list with a recall model. 5. Generate final results by a re-ranking model.
- Learn more: Technical report, code
- Test MRR: 0.9511
Awardees of PCQM4M-LSC Track (Leaderboard)
Winners
1st place: MachineLearning (contact)
- Team members: Chengxuan Ying (Dalian University of Technology), Mingqi Yang (Dalian University of Technology), Shengjie Luo (Peking University), Tianle Cai (Princeton University), Guolin Ke (MSRA), Di He (MSRA), Shuxin Zheng (MSRA), Chenglin Wu (Xiamen University), Yuxin Wang (Dalian University of Technology), Yanming Shen (Dalian University of Technology)
- Method: Graphormer (10 ensemble) + ExpC (8 ensemble)
- Short summary: We adopt Graphormer and ExpC as our basic models. We train each model by 8-fold cross-validation, and additionally train two Graphormer models on the union of training and validation sets with different random seeds. For final submission, we use a naive ensemble for these 18 models by taking average of their outputs.
- Learn more: Technical report, code
- Test MAE: 0.1200
2nd place: SuperHelix (contact)
- Team members: Shanzhuo Zhang (Baidu), Lihang Liu (Baidu), Sheng Gao (Baidu), Donglong He (Baidu), Weibin Li (Baidu), Zhengjie Huang (Baidu), Weiyue Su (Baidu), Wenjin Wang (Baidu)
- Method: LiteGEM
- Short summary: Deep graph neural network with self-supervised tasks on topology and geometry information. 73 models with different tasks and hyper-parameters are ensembled.
- Learn more: Technical report, code
- Test MAE: 0.1204
3rd place: Quantum (contact)
- Team members: Petar Velickovic (DeepMind), Peter Battaglia (DeepMind), Jonathan Godwin (DeepMind), Alvaro Sanchez (DeepMind), David Budden (DeepMind), Shantanu Thakoor (DeepMind), Jacklynn Stott (DeepMind), Ravichandra Addanki (DeepMind), Sibon Li (DeepMind), Andreea Deac (DeepMind)
- Method: Very Deep GN Ensemble + Conformers + Noisy Nodes
- Short summary: A combination of a 32-layer deep Graph Network over RDKit conformer features, and 50-layer deep Graph Network for molecules for which conformers cannot be computed. Denoising regularisation with Noisy Nodes was applied. 20 models with different initialisation and validation splits are ensembled.
- Learn more: Technical report, code
- Test MAE: 0.1205
Runner-ups
4th place: DIVE@TAMU (contact)
- Team members: Meng Liu (Texas A&M University), Cong Fu (Texas A&M University), Xuan Zhang (Texas A&M University), Limei Wang (Texas A&M University), Yaochen Xie (Texas A&M University), Hao Yuan (Texas A&M University), Youzhi Luo (Texas A&M University), Zhao Xu (Texas A&M University), Shenglong Xu (Texas A&M University), Shuiwang Ji (Texas A&M University)
- Method: 2D Deep GNNs + 3D low-cost conformer GNNs
- Short summary: (1) 2D Deeper GNN with larger receptive fields (over 20 layers) and more expressivity over 2D molecular graphs. (2) 3D GNN over low-cost conformer sets, which can be obtained by RDKit with an affordable budget.
- Learn more: Technical report, code
- Test MAE: 0.1235
5th place: no_free_lunch (contact)
- Team members: Xiaochuan Zou (AntGroup), Ruofan Wu (AntGroup), Baokun Wang (AntGroup), Sheng Tian (AntGroup), Yifei Hu (AntGroup), Liang Zhu (AntGroup), Peixuan Chen (AntGroup), MengjiaoZ hang (AntGroup), Yuhao Zhang (AntGroup)
- Method: Ensemble MolNet
- Short summary: Tansformer with graph structure features. 6 models with different initialization and running epochs are ensembled
- Learn more: Technical report, code
- Test MAE: 0.1244
6th place: GNNLearner (contact)
- Team members: Yingce Xia (Microsoft Research Asia), Lijun Wu (Microsoft Research Asia), Shufang Xie (Microsoft Research Asia), Jinhua Zhu (University of Science and Technology of China), Yang Fan (University of Science and Technology of China), Yutai Hou (Harbin Institute of Technology), Tao Qin (Microsoft Research Asia)
- Method: Transformers(standard x4,two-branch x8) + GIN (x5)
- Short summary: (1) Transformer with two branches, where one for regression and the other for classification. Two branches learn from each other; (2) Standard Transformer; (3) GIN; Models with different initialization are aggregated.
- Learn more: Technical report, code
- Test MAE: 0.1253
Final leaderboard for MAG240M-LSC
Classification accuracy. The higher, the better.
Rank | Team | Test Accuracy |
---|---|---|
1 | BD-PGL | 0.7549 |
2 | Academic | 0.7519 |
3 | Synerise AI | 0.7460 |
4 | Topology_mag | 0.7447 |
5 | passages | 0.7381 |
6 | DeeperBiggerBetter | 0.7353 |
7 | paper2paper | 0.7292 |
8 | NTT DOCOMO LABS | 0.7290 |
9 | GraphAI | 0.7278 |
10 | MC-AiDA | 0.7274 |
11 | yangxc | 0.7273 |
12 | AntAntGraph | 0.7272 |
13 | IF-bigdata | 0.7271 |
14 | antkg | 0.7260 |
15 | LEEXY | 0.7254 |
16 | PKUDB3 | 0.7250 |
17 | PM-TEAM | 0.7239 |
18 | Euler | 0.7233 |
19 | UVA-PolyU-MAG | 0.7218 |
20 | hust-tigergraph | 0.7196 |
21 | nvsac | 0.7165 |
22 | Abearcomes | 0.7161 |
23 | MSRA DKI | 0.7118 |
24 | Ikigai | 0.7113 |
25 | Graph@PKU | 0.7076 |
26 | Fireside Coding Club | 0.7035 |
27 | bjtuxingyuan | 0.7030 |
28 | mtMAG | 0.7027 |
29 | CogDL | 0.7018 |
30 | VCGroup | 0.7001 |
31 | UCLA-Graph | 0.6988 |
32 | University of Macau | 0.6946 |
33 | FreedomDancer | 0.6925 |
34 | ALGRUC | 0.6921 |
35 | jianmohuo | 0.6905 |
36 | zjuainet | 0.6897 |
37 | fanfanman | 0.6821 |
38 | GoBruins | 0.6517 |
39 | HUNGraphs | 0.6517 |
40 | Four Colors | 0.6466 |
41 | Oxtest | 0.6408 |
42 | TPA FDU | 0.6400 |
43 | dascim | 0.5898 |
44 | Rotopia | 0.5783 |
45 | Binary Bird | 0.0218 |
46 | BU-LISP | 0.0009 |
Final leaderboard for WikiKG90M-LSC
Mean Reciprocal Rank (MRR). The higher, the better.
Rank | Team | Test MRR |
---|---|---|
1 | BD-PGL | 0.9727 |
2 | OhMyGod | 0.9712 |
3 | GraphMIRAcles | 0.9707 |
4 | littleant | 0.9511 |
5 | vcdbro | 0.9489 |
6 | JohnZheng | 0.9465 |
7 | VCGroup | 0.9432 |
8 | USTC-HNT-1 | 0.9324 |
9 | USTC-HNT | 0.9311 |
10 | CogDL | 0.9275 |
11 | RelAix | 0.9184 |
12 | Neural Bellman-Ford Networks | 0.9178 |
13 | mtWIKIKG | 0.9081 |
14 | cciiplab | 0.9058 |
15 | Heads and Tails | 0.9047 |
16 | Hello KG | 0.8945 |
17 | NTT DOCOMO INC | 0.8935 |
18 | 迷路的小叮当 | 0.8761 |
19 | HET-TigerGraph | 0.8720 |
20 | antkg | 0.8712 |
21 | AJ Team | 0.8688 |
22 | UVA-PolyU-Wiki | 0.8653 |
23 | TransCendence | 0.8252 |
24 | NiuBiP | 0.7425 |
25 | XJTUGNN | 0.6597 |
26 | WashuLink | 0.4791 |
27 | BU-LISP | 0.0629 |
28 | SUFE-OXAI | 0.0029 |
Final leaderboard for PCQM4M-LSC
Mean Absolute Error (MAE). The lower, the better.
Rank | Team | Test MAE |
---|---|---|
1 | MachineLearning | 0.1200 |
2 | SuperHelix | 0.1204 |
3 | Quantum | 0.1205 |
4 | DIVE@TAMU | 0.1235 |
5 | no_free_lunch | 0.1244 |
6 | GNNLearner | 0.1253 |
7 | RelAix | 0.1273 |
8 | VCGroup | 0.1281 |
9 | pauli | 0.1293 |
10 | mtPCQM | 0.1298 |
11 | NTT DOCOMO LABS | 0.1298 |
12 | Schrodinger | 0.1305 |
13 | Ant-AGL-Chem | 0.1306 |
14 | AI Winter is Coming | 0.1324 |
15 | HUNGraphs | 0.1327 |
16 | PreferredSmile | 0.1328 |
17 | ADVERSARIES | 0.1328 |
18 | MoleculeHunter | 0.1338 |
19 | KAICD | 0.1344 |
20 | CogDL | 0.1346 |
21 | MLCollective | 0.1358 |
22 | Team IC1101 | 0.1398 |
23 | DeepBlueAI | 0.1414 |
24 | So Vegetable | 0.1415 |
25 | USTC-DLC | 0.1416 |
26 | Autobot | 0.1418 |
27 | DeeperBiggerBetter | 0.1420 |
28 | Topology_pcq | 0.1421 |
29 | The_Sky_Is_Blue | 0.1423 |
30 | DeepBlueAI | 0.1429 |
31 | The Graphinators | 0.1432 |
32 | JustDoIt | 0.1434 |
33 | CUNY KDD Cup | 0.1435 |
34 | THUMLP | 0.1443 |
35 | Danhuangpai | 0.1447 |
36 | The Long and Winding Node | 0.1457 |
37 | dminers | 0.1467 |
38 | RiseLab | 0.1469 |
39 | FudanGWX | 0.1501 |
40 | USTC-MO | 0.1536 |
41 | braino | 0.1537 |
42 | IITD-GPU-GO-BRRR | 0.1544 |
43 | USTC-MO-1 | 0.1568 |
44 | Celestial Being | 0.1589 |
45 | kojimar | 0.1590 |
46 | BUPTTDCS-TG | 0.1606 |
47 | CIML at UniPI | 0.2105 |
48 | yfishlab | 0.2202 |
49 | GraLITIS | 0.9393 |
Initial leaderboard for MAG240M-LSC
Classification accuracy. The higher, the better.
Rank | Team | Test Accuracy (subset) |
---|---|---|
1 | Synerise AI | 0.7454 |
2 | BD-PGL | 0.7339 |
3 | antkg | 0.7241 |
4 | DeeperBiggerBetter | 0.7132 |
5 | passages | 0.7113 |
6 | Academic | 0.7082 |
7 | PKUDB3 | 0.7067 |
8 | Graph@PKU | 0.7049 |
9 | the-stone-story | 0.7045 |
10 | NTT DOCOMO LABS | 0.7020 |
11 | Abearcomes | 0.7008 |
12 | winone | 0.7003 |
13 | Ikigai | 0.6984 |
14 | AntAntGraph | 0.6975 |
15 | MSRA DKI | 0.6974 |
16 | hust-tigergraph | 0.6967 |
17 | mtMAG | 0.6958 |
18 | VCGroup | 0.6936 |
19 | Topology_mag | 0.6925 |
20 | luke28 | 0.6922 |
21 | MindGMP | 0.6920 |
22 | PM-TEAM | 0.6918 |
23 | LEEXY | 0.6913 |
24 | Susanna | 0.6901 |
25 | HJYgotoPLAY | 0.6892 |
26 | CogDL | 0.6884 |
27 | IF-bigdata | 0.6877 |
28 | Fightfightfight | 0.6875 |
29 | bjtuxingyuan | 0.6871 |
30 | KAICD | 0.6868 |
31 | GraphAI | 0.6864 |
31 | overwatch | 0.6864 |
32 | TheAIDivision | 0.6849 |
33 | fanfanman | 0.6811 |
34 | zjuainet | 0.6798 |
35 | UVA-PolyU-MAG | 0.6782 |
36 | no_free_lunch | 0.6781 |
37 | UCLA-Graph | 0.6771 |
38 | Fireside Coding Club | 0.6751 |
39 | nvsac | 0.6729 |
40 | PAHT-AI | 0.6631 |
41 | ALGRUC | 0.6629 |
42 | yangxc | 0.6557 |
43 | Binary Bird | 0.6504 |
44 | Mojito | 0.6366 |
45 | ANTKCC | 0.5304 |
46 | XJTUGNN | 0.5264 |
47 | Four Colors | 0.5241 |
48 | UPSIDE-DOWN | 0.5233 |
49 | MC-AiDA | 0.5210 |
50 | DeepBlueTechnology | 0.4653 |
51 | BU-LISP | 0.3920 |
52 | dascim | 0.0582 |
53 | paper2paper | 0.0213 |
54 | Rookie | 0.0202 |
55 | lleb-mag | 0.0067 |
Initial leaderboard for WikiKG90M-LSC
Mean Reciprocal Rank (MRR). The higher, the better.
Rank | Team | Test MRR (subset) |
---|---|---|
1 | JohnZheng | 0.9405 |
2 | OhMyGod | 0.9401 |
3 | vcdbro | 0.9291 |
4 | littleant | 0.9261 |
5 | GraphMIRAcles | 0.9223 |
6 | sanwenaoteman | 0.9088 |
7 | esther | 0.9087 |
8 | mtWIKIKG | 0.9035 |
9 | RelAix | 0.9027 |
10 | yeyeye | 0.8967 |
11 | cciiplab | 0.8944 |
12 | 迷路的小叮当 | 0.8938 |
13 | GNNIAUN | 0.8877 |
14 | BD-PGL | 0.8858 |
15 | Neural Bellman-Ford Networks | 0.8792 |
16 | VCGroup | 0.8722 |
17 | CogDL | 0.8705 |
18 | TheAIDivision | 0.8690 |
19 | AJ Team | 0.8681 |
20 | GreatTeam | 0.8675 |
21 | GalaxyX | 0.8647 |
22 | Synerise AI | 0.8606 |
23 | abcdef | 0.8513 |
24 | UVA-PolyU-Wiki | 0.8498 |
25 | TransCendence | 0.8154 |
26 | spoer | 0.8131 |
27 | 迪迦奥特曼 | 0.7847 |
28 | MorningStar | 0.7775 |
29 | USTC-HNT | 0.7356 |
30 | Knowledgeable | 0.7347 |
31 | HET-TigerGraph | 0.7344 |
32 | BU-LISP | 0.7280 |
33 | NTT DOCOMO INC | 0.7199 |
34 | The A-Team | 0.6708 |
35 | Heads and Tails | 0.4346 |
36 | XJTUGNN | 0.0825 |
37 | antkg | 0.0104 |
38 | no_free_lunch | 0.0030 |
39 | fastretrieve.ai | 0.0030 |
Initial leaderboard for PCQM4M-LSC
Mean Absolute Error (MAE). The lower, the better.
Rank | Team | Test MAE (subset) |
---|---|---|
1 | SuperHelix | 0.1294 |
2 | MachineLearing | 0.1328 |
3 | Autobot | 0.1338 |
4 | Quantum | 0.1352 |
5 | no_free_lunch | 0.1356 |
6 | DIVE@TAMU | 0.1363 |
7 | RelAix | 0.1366 |
8 | NTT DOCOMO LABS | 0.1369 |
9 | mtPCQM | 0.1370 |
10 | Ant-AGL-Chem | 0.1394 |
11 | the-stone-story | 0.1403 |
12 | Team IC1101 | 0.1406 |
13 | overwatch | 0.1408 |
14 | TongJing | 0.1413 |
15 | VCGroup | 0.1416 |
16 | Schrodinger | 0.1425 |
17 | CUNY KDD Cup | 0.1443 |
18 | MLCollective | 0.1456 |
19 | GNNLearner | 0.1474 |
20 | So Vegetable | 0.1486 |
21 | The_Sky_Is_Blue | 0.1492 |
22 | yangxc | 0.1497 |
23 | Topology_pcq | 0.1498 |
24 | WBMSDU | 0.1504 |
25 | CogDL | 0.1510 |
26 | braino | 0.1511 |
27 | kojimar | 0.1565 |
28 | HUNGraphs | 0.1572 |
29 | PreferredSmile | 0.1614 |
30 | pauli | 0.1673 |
31 | Tardigrades | 0.1717 |
32 | DBSISDM | 0.1755 |
33 | AI Winter is Coming | 0.1793 |
34 | USTC-MO | 0.1803 |
35 | IITD-GPU-GO-BRRR | 0.1826 |
36 | NO HUMO NO LUMO BRO | 0.1880 |
37 | USTC-MO-1 | 0.1927 |
38 | DeepBlueAI | 0.2090 |
39 | MoleculeHunter | 0.2171 |
40 | Family Business | 0.2215 |
41 | USTC-DLC | 0.2223 |
42 | dminers | 0.2382 |
43 | yfishlab | 0.2865 |
44 | KAICD | 0.3466 |
45 | DeepBlueAI | 0.3694 |
46 | GLab-graph | 0.6404 |
47 | shadoks | 1.1089 |
48 | GraLITIS | 1.3002 |
49 | FocusMind | 5.0008 |