$29.99
Homework Project #1
Readings:
● Intro & Chapter 1: Weapons of Math Destruction (What is a Model?)
● M. Kosinski, D. Stillwell, T. Graepel, “Private traits and attributes are predictable from digital records of human behavior,” Proceedings of the National Academy of Sciences Apr
In this basic assignment, you’ll begin the process of discovering how data from a user’s social media profile is used by various organizations. You’ll accomplish this task by examining your own data profile on social media. You are allowed to use any social media platform for this assignment but you must be able to extract similar data (Steps 1-7) that companies can use to target you. Facebook is provided as an example in the steps below (but you can choose any other social media platform as long as you can complete the steps).
● Step 1: Research how to download a copy of your personal data from your selected social media platform o For Facebook: Information on how to download a copy of your data can be found at: https://www.facebook.com/help/1701730696756992
o For Facebook: This information is found in the categories listed below on Facebook’s “Download Your Information” page (two formats are available: html and json).
1. “Ads Information”
The downloaded data includes the files: advertisers_you've_interacted_with.html
(ads that you have interacted with) and
advertisers_using_your_activity_or_information.html (advertisers that are using your information)
2. “Other Logged Information”
The downloaded data includes the files: ads_interests.html (keywords used to target you).
● Step 3: Based on the data associated with your targeted information, categorize the data into (no less than) 5 categories and (no more than) 10 categories.
o For Facebook: I’ve selected the targeted advertiser list advertisers_who_uploaded_a_contact_list_with_your_information.html as an illustrative example in the next set of steps
● Step 4: Create a data flow graph (e.g. using http://sankeymatic.com/build/) that associates your categories with three types of data buckets: Relevant, Not Relevant, Way Off. Feel free to be creative in the naming and interpretation of your buckets, but you will need to define all three data buckets.
● Step 5: Compute basic statistical measures on the data (per each data bucket): Count, Accuracy (= %Relevant), and Rubbish (%Way Off). Identify which category was the most accurate and which was the least.
● Step 6: Identify which data items could be associated with a regulated domain in law as defined in the lectures (Credit, Education, Employment, Housing and ‘Public Accommodation’). For each of these regulated domains, list how many fall within each and provide a sample of the associated data items.
● Step 7: Turn in a report documenting your findings, including social media platform, number of data items, number of categories/name of categories, data buckets identified, script/code (to create data flow graphic), data flow graphic, statistical measures, regulated domain/data item list. The report should be submitted in JDF format. Reports that are not neat and well organized will receive up to a
10-point deduction. The file name for submission is GTuserName_Assignment_1, for example, Joyner03_Assignment_1. Below is an example report not in JDF format associated with my advertiser data for reference:
Prof. Ayanna Howard
Social Media Platform: Facebook
Number of Advertisers: 1700
Categories Identified (5):
Car Companies (e.g. International Autos Mercedes Benz)
Shopping (e.g. Tiffany & Co.)
Interest Groups (e.g. AARP)
Entertainment – (e.g. Applebee's Grill & Bar)
Data Buckets: Yes, No, U Got To Be Kidding
My script on sankeymatic.com:
FB Advertisers [680] Car Companies
FB Advertisers [340] Social Impact
FB Advertisers [170] Shopping
FB Advertisers [340] Interest Groups
FB Advertisers [170] Entertainment
Car Companies [85] Yes
Car Companies [595] No
Social Impact [340] Yes
Shopping [84] No
Shopping [86] U Got to be Kidding
Interest Groups [170] Yes
Interest Groups [170] No
Entertainment [85] Yes
Entertainment [85] No
My data flow graphic:
Table: Summary Statistics (Note: Partial Example for One Category)
Category Data Bucket Count Accuracy Rubbish
Shopping U Got to be Kidding 86
No 84
Yes 0
Total Count 170 0% 51%
My most accurate category: Social Impact
My least accurate category (i.e. rubbish): Shopping
Table: Regulated Domain Information
Regulated Domain Number of Items Advertiser Sample
Credit 230 Alliant Credit Union Anchor Capital
Employment 0
Housing 2 Ashton Woods Homes
Echo Fine Properties