KDD Cup 2021 has been concluded. This is an archive page.
Updated page is here. Competition results announced here.

Learn about competition results and winning solutions


Awardees of MAG240M-LSC Track (Leaderboard)

Winners
1st place: BD-PGL (contact)
  • Team members: Yunsheng Shi (Baidu), Zhengjie Huang (Baidu), Weibin Li (Baidu), Weiyue Su (Baidu), Shikun Feng (Baidu)
  • Method: R-UNIMP
  • Short summary: We adopt the recent advanced technique UniMP which proposes to incorporate feature and label propagation at both training and inference time, making significant improvements across several node classification tasks. And we modify it into an R-UniMP version for a heterogeneous graph with “R” stands for “Relational”. Besides, we provide a detailed recall of our key strategies and valuable findings during the en- tire competition. 30 models with different initialization are ensembled.
  • Learn more: Technical report, code
  • Test accuracy: 0.7549
2nd place: Academic (contact)
  • Team members: Petar Velickovic (DeepMind), Peter Battaglia (DeepMind), Jonathan Godwin (DeepMind), Alvaro Sanchez (DeepMind), David Budden (DeepMind), Shantanu Thakoor (DeepMind), Jacklynn Stott (DeepMind), Ravichandra Addanki (DeepMind), Thomas Keck (DeepMind), Andreea Deac (DeepMind)
  • Method: MPNN Ensemble with BGRL fine-tuning
  • Short summary: MPNN over subsampled patches, fine-tuned by BGRL (bootstrapped graph latents) self-supervised objective. 20 models with different initialisation and validation split are ensembled.
  • Learn more: Technical report, code
  • Test accuracy: 0.7519
3rd place: Synerise AI (contact)
  • Team members: Michal Daniluk (Synerise), Jacek Dabrowski (Synerise), Konrad Goluchowski (Synerise), Barbara Rychalska (Warsaw University of Technology/Synerise)
  • Method: Cleora + EMDE
  • Short summary: We tackle the task with an efficient model based on our previously introduced algorithms: EMDE and Cleora, on top of a simplistic feed-forward neural network. We use EMDE to represent nodes in the form of sketches – structures representing local similarity, which additionally allow for easy accumulation of multiple object values. We use Cleora for label propagation, i.e. representing nodes with sets of labels observed in the training data. To achieve maximal performance, we train 60 independent ensemble models.
  • Learn more: Technical report, code
  • Test accuracy: 0.7460
Runner-ups
4th place: Topology_mag (contact)
  • Team members: Qiuying Peng (OPPO Research), Wencai Cao (OPPO Research), Zheng Pan (OPPO Research)
  • Method: MPLP + finetune (40 ensemble)
  • Short summary: Metapath-based Label Propagation with fine-tuning using latest year samples. models from 5-fold cross-validation in 8 random seeds are ensembled.
  • Learn more: Technical report, code
  • Test accuracy: 0.7447
5th place: passages (contact)
  • Team members: Bole Ai (Nanjing University), Xiang Long (Beijing University of Posts and Telecommunications), Kaiyuan Li (Beijing University of Posts and Telecommunications), Quan Lin (Huazhong University of Science and Technology), Xiaofan Liu (Beijing University of Posts and Telecommunications), Pengfei Wang (Beijing University of Posts and Telecommunications), Mingdao Wang (Beijing University of Posts and Telecommunications), Zhichao Feng (Beijing University of Posts and Telecommunications), Kun Zhao (Nanjing University)
  • Method: SGC + R-GAT + Finetune
  • Short summary: Our method can be largely summarized in two main stages: a pretraining stage is designed to explore heterogeneous academic networks for better node embeddings; and a transfer learning stage is used to alleviate differences of label distributions and node representations between training and test set.
  • Learn more: Technical report, code
  • Test accuracy: 0.7381
6th place: DeeperBiggerBetter (contact)
  • Team members: Guohao Li (KAUST), Hesham Mostafa (Intel Corporation), Jesus Alejandro Zarzar Torano (KAUST), Sami Abu-El-Haija (USC), Marcel Nassar (Intel Labs), Daniel Cummings (Intel Corporation), Sohil Shah (Intel Corporation), Matthias Mueller (Intel Labs), Bernard Ghanem (KAUST)
  • Method: GNN180M
  • Short summary: We train two R-GAT models, one with 2 layers and another with 3 layers for a total of 180M parameters. We utilize author labels as extra regularization, conduct multiple inference passes with proportional neighborhood sizes, aggregate their results by ensembling and then apply a label smoothing trick on model’s predictions with author labels for post-processing.
  • Learn more: Technical report, code
  • Test accuracy: 0.7353

Awardees of WikiKG90M-LSC Track (Leaderboard)

Winners
1st place: BD-PGL (contact)
  • Team members: Weiyue Su (Baidu), Shikun Feng (Baidu), Zeyang Fang (Baidu), Huijuan Wang (Baidu), Siming Dai (Baidu), Hui Zhong (Baidu), Yunsheng Shi (Baidu), Zhengjie Huang (Baidu)
  • Method: NOTE + Feature
  • Short summary: We modified OTE into NOTE for better performance and use the post-smoothing technique to capture the graph structure for supplementation. Feature engineering further improves the results.
  • Learn more: Technical report, code
  • Test MRR: 0.9727
  • Note: The solution explictly makes use of val/test tail candidate entities provided by the organizer. In practice, those candidates are not provided, and a model needs to rank among all entities.
2nd place: OhMyGod (contact)
  • Team members: Weihua Peng (Harbin Institute of Technology)
  • Method: TransE, CompIEx, DistMult, and SimplE (9 ensemble)
  • Short summary: a) more powerful representation vector learning, b) the complementarity between different models, c) candidate filtering based on the aggregated statistics of the validation/test tail candidates.
  • Learn more: Technical report, code
  • Test MRR: 0.9712
  • Note: The solution explictly makes use of val/test tail candidate entities provided by the organizer. In practice, those candidates are not provided, and a model needs to rank among all entities.
3rd place: GraphMIRAcles (contact)
  • Team members: Jianyu Cai (University of Science and Technology of China), Jiajun Chen (University of Science and Technology of China), Taoxing Pan (University of Science and Technology of China), Zhanqiu Zhang (University of Science and Technology of China), Jie Wang (University of Science and Technology of China)
  • Method: ComplEx-CMRC + Rule + KD (15 ensemble)
  • Short summary: Encoder: Concat-MLP with Residual Connection (CMRC). Decoder: ComplEx. Rule mining for data augmentation. Knowledge distillation to improve single models. 15 models with different random seeds are ensembled.
  • Learn more: Technical report, code
  • Test MRR: 0.9707
  • Note: The solution explictly makes use of val/test tail candidate entities provided by the organizer. In practice, those candidates are not provided, and a model needs to rank among all entities.
Runner-ups
4th place: littleant (contact)
  • Team members: Shuo Yang (Ant Group), Daixin Wang (Ant Group), Dingyuan Zhu (Ant Group), Yakun Wang (Ant Group), Borui Ye (Ant Group)
  • Method: AntLinkPred
  • Short summary: 1. Heuristic features extraction. 2. Decoder (TransE, AutoSF, PariRE, RotatE,. etc) selection and model ensemble. 3. Model fine-tuning with low-confidence samples. 4. Generate a smaller candidates list with a recall model. 5. Generate final results by a re-ranking model.
  • Learn more: Technical report, code
  • Test MRR: 0.9511

Awardees of PCQM4M-LSC Track (Leaderboard)

Winners
1st place: MachineLearning (contact)
  • Team members: Chengxuan Ying (Dalian University of Technology), Mingqi Yang (Dalian University of Technology), Shengjie Luo (Peking University), Tianle Cai (Princeton University), Guolin Ke (MSRA), Di He (MSRA), Shuxin Zheng (MSRA), Chenglin Wu (Xiamen University), Yuxin Wang (Dalian University of Technology), Yanming Shen (Dalian University of Technology)
  • Method: Graphormer (10 ensemble) + ExpC (8 ensemble)
  • Short summary: We adopt Graphormer and ExpC as our basic models. We train each model by 8-fold cross-validation, and additionally train two Graphormer models on the union of training and validation sets with different random seeds. For final submission, we use a naive ensemble for these 18 models by taking average of their outputs.
  • Learn more: Technical report, code
  • Test MAE: 0.1200
2nd place: SuperHelix (contact)
  • Team members: Shanzhuo Zhang (Baidu), Lihang Liu (Baidu), Sheng Gao (Baidu), Donglong He (Baidu), Weibin Li (Baidu), Zhengjie Huang (Baidu), Weiyue Su (Baidu), Wenjin Wang (Baidu)
  • Method: LiteGEM
  • Short summary: Deep graph neural network with self-supervised tasks on topology and geometry information. 73 models with different tasks and hyper-parameters are ensembled.
  • Learn more: Technical report, code
  • Test MAE: 0.1204
3rd place: Quantum (contact)
  • Team members: Petar Velickovic (DeepMind), Peter Battaglia (DeepMind), Jonathan Godwin (DeepMind), Alvaro Sanchez (DeepMind), David Budden (DeepMind), Shantanu Thakoor (DeepMind), Jacklynn Stott (DeepMind), Ravichandra Addanki (DeepMind), Sibon Li (DeepMind), Andreea Deac (DeepMind)
  • Method: Very Deep GN Ensemble + Conformers + Noisy Nodes
  • Short summary: A combination of a 32-layer deep Graph Network over RDKit conformer features, and 50-layer deep Graph Network for molecules for which conformers cannot be computed. Denoising regularisation with Noisy Nodes was applied. 20 models with different initialisation and validation splits are ensembled.
  • Learn more: Technical report, code
  • Test MAE: 0.1205
Runner-ups
4th place: DIVE@TAMU (contact)
  • Team members: Meng Liu (Texas A&M University), Cong Fu (Texas A&M University), Xuan Zhang (Texas A&M University), Limei Wang (Texas A&M University), Yaochen Xie (Texas A&M University), Hao Yuan (Texas A&M University), Youzhi Luo (Texas A&M University), Zhao Xu (Texas A&M University), Shenglong Xu (Texas A&M University), Shuiwang Ji (Texas A&M University)
  • Method: 2D Deep GNNs + 3D low-cost conformer GNNs
  • Short summary: (1) 2D Deeper GNN with larger receptive fields (over 20 layers) and more expressivity over 2D molecular graphs. (2) 3D GNN over low-cost conformer sets, which can be obtained by RDKit with an affordable budget.
  • Learn more: Technical report, code
  • Test MAE: 0.1235
5th place: no_free_lunch (contact)
  • Team members: Xiaochuan Zou (AntGroup), Ruofan Wu (AntGroup), Baokun Wang (AntGroup), Sheng Tian (AntGroup), Yifei Hu (AntGroup), Liang Zhu (AntGroup), Peixuan Chen (AntGroup), MengjiaoZ hang (AntGroup), Yuhao Zhang (AntGroup)
  • Method: Ensemble MolNet
  • Short summary: Tansformer with graph structure features. 6 models with different initialization and running epochs are ensembled
  • Learn more: Technical report, code
  • Test MAE: 0.1244
6th place: GNNLearner (contact)
  • Team members: Yingce Xia (Microsoft Research Asia), Lijun Wu (Microsoft Research Asia), Shufang Xie (Microsoft Research Asia), Jinhua Zhu (University of Science and Technology of China), Yang Fan (University of Science and Technology of China), Yutai Hou (Harbin Institute of Technology), Tao Qin (Microsoft Research Asia)
  • Method: Transformers(standard x4,two-branch x8) + GIN (x5)
  • Short summary: (1) Transformer with two branches, where one for regression and the other for classification. Two branches learn from each other; (2) Standard Transformer; (3) GIN; Models with different initialization are aggregated.
  • Learn more: Technical report, code
  • Test MAE: 0.1253

Final leaderboard for MAG240M-LSC

Classification accuracy. The higher, the better.
Rank Team Test Accuracy
1 BD-PGL 0.7549
2 Academic 0.7519
3 Synerise AI 0.7460
4 Topology_mag 0.7447
5 passages 0.7381
6 DeeperBiggerBetter 0.7353
7 paper2paper 0.7292
8 NTT DOCOMO LABS 0.7290
9 GraphAI 0.7278
10 MC-AiDA 0.7274
11 yangxc 0.7273
12 AntAntGraph 0.7272
13 IF-bigdata 0.7271
14 antkg 0.7260
15 LEEXY 0.7254
16 PKUDB3 0.7250
17 PM-TEAM 0.7239
18 Euler 0.7233
19 UVA-PolyU-MAG 0.7218
20 hust-tigergraph 0.7196
21 nvsac 0.7165
22 Abearcomes 0.7161
23 MSRA DKI 0.7118
24 Ikigai 0.7113
25 Graph@PKU 0.7076
26 Fireside Coding Club 0.7035
27 bjtuxingyuan 0.7030
28 mtMAG 0.7027
29 CogDL 0.7018
30 VCGroup 0.7001
31 UCLA-Graph 0.6988
32 University of Macau 0.6946
33 FreedomDancer 0.6925
34 ALGRUC 0.6921
35 jianmohuo 0.6905
36 zjuainet 0.6897
37 fanfanman 0.6821
38 GoBruins 0.6517
39 HUNGraphs 0.6517
40 Four Colors 0.6466
41 Oxtest 0.6408
42 TPA FDU 0.6400
43 dascim 0.5898
44 Rotopia 0.5783
45 Binary Bird 0.0218
46 BU-LISP 0.0009

Final leaderboard for WikiKG90M-LSC

Mean Reciprocal Rank (MRR). The higher, the better.
Rank Team Test MRR
1 BD-PGL 0.9727
2 OhMyGod 0.9712
3 GraphMIRAcles 0.9707
4 littleant 0.9511
5 vcdbro 0.9489
6 JohnZheng 0.9465
7 VCGroup 0.9432
8 USTC-HNT-1 0.9324
9 USTC-HNT 0.9311
10 CogDL 0.9275
11 RelAix 0.9184
12 Neural Bellman-Ford Networks 0.9178
13 mtWIKIKG 0.9081
14 cciiplab 0.9058
15 Heads and Tails 0.9047
16 Hello KG 0.8945
17 NTT DOCOMO INC 0.8935
18 迷路的小叮当 0.8761
19 HET-TigerGraph 0.8720
20 antkg 0.8712
21 AJ Team 0.8688
22 UVA-PolyU-Wiki 0.8653
23 TransCendence 0.8252
24 NiuBiP 0.7425
25 XJTUGNN 0.6597
26 WashuLink 0.4791
27 BU-LISP 0.0629
28 SUFE-OXAI 0.0029

Final leaderboard for PCQM4M-LSC

Mean Absolute Error (MAE). The lower, the better.
Rank Team Test MAE
1 MachineLearning 0.1200
2 SuperHelix 0.1204
3 Quantum 0.1205
4 DIVE@TAMU 0.1235
5 no_free_lunch 0.1244
6 GNNLearner 0.1253
7 RelAix 0.1273
8 VCGroup 0.1281
9 pauli 0.1293
10 mtPCQM 0.1298
11 NTT DOCOMO LABS 0.1298
12 Schrodinger 0.1305
13 Ant-AGL-Chem 0.1306
14 AI Winter is Coming 0.1324
15 HUNGraphs 0.1327
16 PreferredSmile 0.1328
17 ADVERSARIES 0.1328
18 MoleculeHunter 0.1338
19 KAICD 0.1344
20 CogDL 0.1346
21 MLCollective 0.1358
22 Team IC1101 0.1398
23 DeepBlueAI 0.1414
24 So Vegetable 0.1415
25 USTC-DLC 0.1416
26 Autobot 0.1418
27 DeeperBiggerBetter 0.1420
28 Topology_pcq 0.1421
29 The_Sky_Is_Blue 0.1423
30 DeepBlueAI 0.1429
31 The Graphinators 0.1432
32 JustDoIt 0.1434
33 CUNY KDD Cup 0.1435
34 THUMLP 0.1443
35 Danhuangpai 0.1447
36 The Long and Winding Node 0.1457
37 dminers 0.1467
38 RiseLab 0.1469
39 FudanGWX 0.1501
40 USTC-MO 0.1536
41 braino 0.1537
42 IITD-GPU-GO-BRRR 0.1544
43 USTC-MO-1 0.1568
44 Celestial Being 0.1589
45 kojimar 0.1590
46 BUPTTDCS-TG 0.1606
47 CIML at UniPI 0.2105
48 yfishlab 0.2202
49 GraLITIS 0.9393

Initial leaderboard for MAG240M-LSC

Classification accuracy. The higher, the better.
Rank Team Test Accuracy (subset)
1 Synerise AI 0.7454
2 BD-PGL 0.7339
3 antkg 0.7241
4 DeeperBiggerBetter 0.7132
5 passages 0.7113
6 Academic 0.7082
7 PKUDB3 0.7067
8 Graph@PKU 0.7049
9 the-stone-story 0.7045
10 NTT DOCOMO LABS 0.7020
11 Abearcomes 0.7008
12 winone 0.7003
13 Ikigai 0.6984
14 AntAntGraph 0.6975
15 MSRA DKI 0.6974
16 hust-tigergraph 0.6967
17 mtMAG 0.6958
18 VCGroup 0.6936
19 Topology_mag 0.6925
20 luke28 0.6922
21 MindGMP 0.6920
22 PM-TEAM 0.6918
23 LEEXY 0.6913
24 Susanna 0.6901
25 HJYgotoPLAY 0.6892
26 CogDL 0.6884
27 IF-bigdata 0.6877
28 Fightfightfight 0.6875
29 bjtuxingyuan 0.6871
30 KAICD 0.6868
31 GraphAI 0.6864
31 overwatch 0.6864
32 TheAIDivision 0.6849
33 fanfanman 0.6811
34 zjuainet 0.6798
35 UVA-PolyU-MAG 0.6782
36 no_free_lunch 0.6781
37 UCLA-Graph 0.6771
38 Fireside Coding Club 0.6751
39 nvsac 0.6729
40 PAHT-AI 0.6631
41 ALGRUC 0.6629
42 yangxc 0.6557
43 Binary Bird 0.6504
44 Mojito 0.6366
45 ANTKCC 0.5304
46 XJTUGNN 0.5264
47 Four Colors 0.5241
48 UPSIDE-DOWN 0.5233
49 MC-AiDA 0.5210
50 DeepBlueTechnology 0.4653
51 BU-LISP 0.3920
52 dascim 0.0582
53 paper2paper 0.0213
54 Rookie 0.0202
55 lleb-mag 0.0067

Initial leaderboard for WikiKG90M-LSC

Mean Reciprocal Rank (MRR). The higher, the better.
Rank Team Test MRR (subset)
1 JohnZheng 0.9405
2 OhMyGod 0.9401
3 vcdbro 0.9291
4 littleant 0.9261
5 GraphMIRAcles 0.9223
6 sanwenaoteman 0.9088
7 esther 0.9087
8 mtWIKIKG 0.9035
9 RelAix 0.9027
10 yeyeye 0.8967
11 cciiplab 0.8944
12 迷路的小叮当 0.8938
13 GNNIAUN 0.8877
14 BD-PGL 0.8858
15 Neural Bellman-Ford Networks 0.8792
16 VCGroup 0.8722
17 CogDL 0.8705
18 TheAIDivision 0.8690
19 AJ Team 0.8681
20 GreatTeam 0.8675
21 GalaxyX 0.8647
22 Synerise AI 0.8606
23 abcdef 0.8513
24 UVA-PolyU-Wiki 0.8498
25 TransCendence 0.8154
26 spoer 0.8131
27 迪迦奥特曼 0.7847
28 MorningStar 0.7775
29 USTC-HNT 0.7356
30 Knowledgeable 0.7347
31 HET-TigerGraph 0.7344
32 BU-LISP 0.7280
33 NTT DOCOMO INC 0.7199
34 The A-Team 0.6708
35 Heads and Tails 0.4346
36 XJTUGNN 0.0825
37 antkg 0.0104
38 no_free_lunch 0.0030
39 fastretrieve.ai 0.0030

Initial leaderboard for PCQM4M-LSC

Mean Absolute Error (MAE). The lower, the better.
Rank Team Test MAE (subset)
1 SuperHelix 0.1294
2 MachineLearing 0.1328
3 Autobot 0.1338
4 Quantum 0.1352
5 no_free_lunch 0.1356
6 DIVE@TAMU 0.1363
7 RelAix 0.1366
8 NTT DOCOMO LABS 0.1369
9 mtPCQM 0.1370
10 Ant-AGL-Chem 0.1394
11 the-stone-story 0.1403
12 Team IC1101 0.1406
13 overwatch 0.1408
14 TongJing 0.1413
15 VCGroup 0.1416
16 Schrodinger 0.1425
17 CUNY KDD Cup 0.1443
18 MLCollective 0.1456
19 GNNLearner 0.1474
20 So Vegetable 0.1486
21 The_Sky_Is_Blue 0.1492
22 yangxc 0.1497
23 Topology_pcq 0.1498
24 WBMSDU 0.1504
25 CogDL 0.1510
26 braino 0.1511
27 kojimar 0.1565
28 HUNGraphs 0.1572
29 PreferredSmile 0.1614
30 pauli 0.1673
31 Tardigrades 0.1717
32 DBSISDM 0.1755
33 AI Winter is Coming 0.1793
34 USTC-MO 0.1803
35 IITD-GPU-GO-BRRR 0.1826
36 NO HUMO NO LUMO BRO 0.1880
37 USTC-MO-1 0.1927
38 DeepBlueAI 0.2090
39 MoleculeHunter 0.2171
40 Family Business 0.2215
41 USTC-DLC 0.2223
42 dminers 0.2382
43 yfishlab 0.2865
44 KAICD 0.3466
45 DeepBlueAI 0.3694
46 GLab-graph 0.6404
47 shadoks 1.1089
48 GraLITIS 1.3002
49 FocusMind 5.0008