$24.99
Project II
This is a 2-member project assignment. Each group is supposed to work on the steps together, including the writeup of the report. Please submit one set of your report and related files per group on Blackboard.
1 The Default Project
All undergraduate students should work on this project. As for Ph.D. and Masters students, there are two other options available:
1. Proposing your own research projects.
2. Working with Prof. Shu for promising research projects.
You still need to form a 2-member group for the above options. You can find the instructions in Section 2 and Section 3.
1.1 Task: Fake News Classification
Social media has become one of the major resources for people to obtain news and information. For example, it is found that social media now outperforms television as the major news source. However, because it is cheap to provide news online and much faster and easier to disseminate through social media, large volumes of fake news or misinformation are produced online for a variety of purposes, such as financial and political gain. The extensive spread of fake news/misinformation can have a serious negative impact on individuals and society: (i) breaking the authenticity balance of the news ecosystem; (ii) intentionally persuading consumers to accept biased or false beliefs; and (iii) changing the way people interpret and respond to real news and information. Therefore, it is important to detect fake news and misinformation in social media.
We formally define the task as follow. Given the title of a fake news article A and the title of a coming news article B, participants are asked to classify B into one of the three categories:
• agreed: B talks about the same fake news as A.
• disagreed: B refutes the fake news in A.
• unrelated: B is unrelated to A.
1.2 File Descriptions
In the attached folder, you are provided with 3 CSV files:
• train.csv: Training data
• test.csv: Test data
• sample submission.csv: Expected submission format
The training data includes the “label” of each news pair, while the test data doesn’t. Validation data can be split from train.csv. Students should use the training data to train a classifier and evaluate their model’s performance with the validation data. Finally, by using the trained model, you are required to predict the results for the test data. The format of your output file should be the same as “sample submission.csv” with your prediction replaced in“lable” column. The columns in train and test data are as follows:
• id: the id of each news pair.
• tid1: the id of fake news title 1.
• tid2: the id of news title 2.
• title1 en: the fake news title 1 in English.
• title2 en: the news title 2 in English.
• label: indicates the relation between the news pair: agreed/disagreed/unrelated.
1.3 Submission
Students are supposed to submit the result file (named “submission.csv”), source code, presentation slides and report in one .zip file named LASTNAME1 LASTNAME2 PJ2 (Instead of LASTNAME1 and LASTNAME2 type the lastname of each member). The submitted results should be reproducible with the submitted code/data. Moreover, do not change the name of the files as your submitted .csv file will pass an automatic program. The report should not be less than 2 pages and should include description of the data pre-processing, model, and validation results. Use a “Reference” section and cite all the papers, tutorials, packages, software and libraries you used for your program.
2 Proposing Projects by Your Own
3 Working on Cutting-Edge Research Projects
For MS students with thesis and PhD students, if you want to work closely with Prof. Shu for promising research projects, you can choose from following projects and contact him directly:
• Explainable graph neural networks. Machine Learning (ML) has achieved great success with the development of deep neural networks. However, conventional deep models are often treated as black-boxes and lacking in transparency of their inner mechanisms, which leave users with little understanding of the rationale of predictions and it is hard to gain users’ trust. Therefore, research on interpretable ML methods is attracting increasing attention. One branch of explainable ML is to design explanation models for GNN (graph neural network). We would like to reproduce some published GNN explanation models and then design more effective GNN explanation models.
• Evidence-enhanced disinformation detection. External KB such as Wikipedia contains a large amount of high-quality structured subject-predicate-object triplets and unstructured entity descriptions, which could serve as evidence for detecting fake news. We need to crawl evidence data from the Web to help fake news detection.