Starting from:

$24.99

FE-582 Assignment 3 Solved

The data provided in the files contains several quantitative and categorical variables associate with each ticker. Please select a subset of 100 tickers from each file and use data for a specific year (ex: 2013). Use a small number of quantitative variables (10 or 12) out of ~76 columns available (example: After Tax ROE, Cash Ratio, Current Ratio, Operating Margin, Pre-Tax Margin, Pre-Tax ROE, Profit Margin, Quick Ratio, Total Assets, Total Liabilities, Earnings Per Share, etc…). The categorical variables available are GICS Sector, GICS Sub Industry, and possibly HQ Address (although this is sparse data for the 100 tickers subset selected).

 

Next, you have to apply several distance and similarity functions to find the extreme values for distance and similarities between the subset of tickers that you chose. For each of the following cases, please define the function that allows you to calculate the quantity required, calculate the values for all ticker pairs, and rank the pairs by calculated value of distance or similarity, and report the top and bottom 10 values for each case:

a)      𝐿𝐿𝑝𝑝-norm for 𝑝𝑝=1

b)      𝐿𝐿𝑝𝑝-norm for 𝑝𝑝=2

c)     

 
 


𝐿𝐿𝑝𝑝-norm for 𝑝𝑝=3

d)      𝐿𝐿𝑝𝑝-norm for 𝑝𝑝=10

e)      Minkovski distance (assign different weights for the feature components in the Lp-norm based on your assessment on the importance of the features)

f)       Match-Based Similarity Computation (use a small number of equi-depth buckets, ex: 3) g) Mahalanobis distance 

h)      Similarity: overlap measure

i)        Similarity: inverse frequency

j)        Similarity: Goodall

k)      Overall similarity between tickers by using mixed type data (choose a 𝜆𝜆 value for calculation)

l)        Overall normalized similarity between tickers by using mixed type data (choose a 𝜆𝜆 value for calculation)

 

More products