Starting from:

$25

AccelerateAI - Data Science Global Bootcamp  - Assignment 02  - Solved

                       
Data Normalization & Probability: 

 Instruction: There are 6 questions. Q1 through Q3 are on Encoding/Normalization/Scaling and Q4 through Q6 are on Probability. 

 

Encoding/Normalization/Scaling 

 

Q1:  Refer to the dataset in GitHub: https://github.com/Accelerate-AI/Data-Science-GlobalBootcamp/blob/main/01%20Python/gapminder.csv  

 

The “Gapminder” dataset contains the health, life expectancy and GDP information for multiple countries categorized by “Region”. Use encoding techniques (One hot encoding and Label encoding) for the category column “Region” and provide your solution in the form of Jupyter notebook file. Explain which encoding technique should you use and why?

 

For Q2 and Q3, please follow below description: Let's refer free “Wine” Dataset that is deposited on the UCI machine learning repository. You can refer to GitHub for this - https://github.com/Accelerate-AI/Data-Science-GlobalBootcamp/blob/main/01%20Python/wine_data_UCI.csv   

The Wine dataset consists of 3 different classes/qualities where each row corresponds to a particular wine sample. The "quality" features indicate the class/quality of wine and it is represented as (1, 2, 3) and rest of the columns correspond to 13 different attributes (features).

Consider the features - the wine quality i.e. "quality", "Alcohol" (percent/volume) and "Malicacid" (g/l).

Q2: Do you think feature scaling is required? If yes - Why, If no - why?

Q3: If you feel feature scaling is required, then perform Standardization and Normalization and provide your result. What is the difference you observe between these two methods?

 

Provide your solution in the form of Jupyter notebook file wherever applicable.  

  

 



Probability 

 

Q4: Facebook has a content team that labels pieces of content on the platform as spam or not spam. 90% of them are diligent raters and will correctly label 95% of the time. The remaining 10% are non-diligent raters and will label 50% of the content incorrectly. Assume the pieces of content are labeled independently from one another, for every rater. Given that a piece has been rated as non-spam, what is the probability that is it actually non-spam? 

 

Q5: If the probability of seeing a car on the highway in 30 minutes is 0.95, what is the probability of seeing a car on the highway in 10 minutes? (Assume a constant default probability)

 

Q6: A machine produces items of which 1% at random are defective. How many items can be packed in a box while keeping the chance of one or more defectives in the box to be no more than 0.5? What are the expected value and standard deviation of the number of defectives in a box of that size? 

  

More products