Starting from:

$25

MDS - Manufacturing Data Science - Assignment 4  - Solved

Please solve the following questions and justify your answer. Show all your analysis result including equation/calculation or Python code in your report.

 

1.      (30%) Statistical Process Control (Problem in 15.8.5) 

Exercise 15.8.5 in Montgomery and Runger (Applied Statistics and Probability for Engineers, 7th edition, 2018)

Heart rate (in counts/minute) is measured every 30 minutes. The results of 20 consecutive measurements are as follows.

 

Sample No. 
Heart Rate 
Sample No. 
Heart Rate 

68 
11 
79 

71 
12 
79 

67 
13 
78 

69 
14 
78 

71 
15 
78 

70 
16 
79 

69 
17 
79 

67 
18 
82 

70 
19 
82 
10 
70 
20 
81 
 

Use 𝜇 = 70 and 𝜎 = 3.  

(a)   (5%) Construct an EWMA control chart with 𝜆 = 0.1. Use 𝐿 = 2.81. Does the process appear to be in control?

(b)   (5%) Construct an EWMA control chart with 𝜆 = 0.5. Use 𝐿 = 3.07. Compare your results to those in part (a).

(c)   (5%) If the heart rate mean shifts to 76, approximate the ARLs for the charts in parts (a) and (b).

(d)   (5%) What’s the probability that the control chart detects a shift to 76 on the first sample following the shift.

(e)   (10%) What is the probability that the control chart does not detect a shift to 76 on the first sample following the shift, but does detect it on the second sample?

 

2.      (30%) Prognostics and Health Management (PHM) 

This dataset was used for the prognostics challenge competition at the International Conference on Prognostics and Health Management (PHM2008).  

  Data sets consist of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine – i.e., the data can be considered to be from a fleet of engines of the same type. There are 218 engines. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data are contaminated with sensor noise.

  The engine is operating normally at the start of each time series, and starts to degrade at some point during the series. In the training set, the degradation grows in magnitude until a predefined threshold is reached beyond which it is not preferable to operate the engine. In the test set, the time series ends some time prior to complete degradation. The objective of the competition is to predict the number of remaining operational cycles before in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate properly.

 

  The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle; each column is a different variable. The columns correspond to:

1)          unit number

2)          time, in cycles

3)          operational setting 1

4)          operational setting 2

5)          operational setting 3

6)          sensor measurement  1

7)          sensor measurement  2

...

26) sensor measurement  21

 

  Users are expected to train their algorithms using data in the file named train.txt. You must then evaluate the RUL prediction performance on data provided in file test.txt. You may download the dataset source here: https://ti.arc.nasa.gov/c/13/. For model evaluation, the final score is a weighted sum of RUL errors. The scoring function is an asymmetric function that penalizes late predictions more than the early predictions. (Please download the dataset from the linkage and see attached documentation for details)

 

Answer all the following questions with respect to unit number 1 (i.e. engine #1) ONLY for TRAINING dataset. 

              

(a)   (5%) RUL Calculation: calculate remaining useful life (RUL) for each engine (different unit number) (hint: use the max time (in cycles) minus the current time, and create the remain useful life column). In fact, this is the task of “labelling” for supervised learning.

(b)   (5%) Variation Analysis: calculate the coefficient of variation (the ratio of the standard deviation to the mean) of each sensor. Which sensor shows maximum and minimum coefficient of variation, respectively? What’s the insight you can provide? (hint: variance implies information content)

(c)   (5%) Feature Engineering: feature engineering is used to derive more features for prediction.

In time domain, calculate the “moving” average/ variance/ peak value (max value) of each sensor respectively, by using predetermined length of the time window (eg. In unit number 1: calculate (0 to 10), (1 to 11), (2 to 13)…if the time window equal to 11 cycles). You don’t need to consider the end of time series which does not have enough cycles in the window.

(d)   (5%) Feature Selection 1: do the similar moving average work to RUL column, then calculate the correlation coefficients between each generated feature (i.e. average, variance, and max value) and RUL column. Identify the Top 10 features which have high absolute value of correlation coefficients with RUL.

(e)   (5%) Feature Selection 2: Use random forest and identify the Top 10 important features (i.e.

average, variance, and max value) with respect to the RUL column. Plot the line plot which x axis is cycle and y axis is these 10 important features.

(f)    (5%送分題) Please “feel free” to read the two solutions shown in the following linkages when you are available. You don’t need to do any work about this question (f).

Python solution: https://github.com/mustafashabbir10/Prognosis/blob/master/Prognosis.ipynb 

R solution: http://mkalikatzarakis.eu/wp-content/uploads/2018/12/code.html 

 

 

3. Programming Questions (35%) 

Please use Python to answer the following questions. Provide your code and justify your answer. Show all your work in detail including specific algorithm and parameter design. You should hand in TWO files (one for Tabu and one for Genetic Algorithm) regarding to each meta-heuristic algorithm, respectively. The result should include optimal solution (i.e., job sequence), optimal function (i.e. fitness) value, running time, number of tardy jobs. For the parameter settings (eg. tabu size, crossover rate, mutation rate, etc.), please give a simple trial-and-error or design of experiment for sensitivity analysis. 

Single-Machine Scheduling Problem

Please answer following single-machine total weighted tardiness problem. The objective function is to minimize the total weighted tardiness.

Jobs 









10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
Processing Time 
10 
10 
13 




15 




15 

11 


14 
18 

Due Date 
50 
38 
49 
12 
20 
105 
73 
45 

64 
15 

92 
43 
78 
21 
15 
50 
150 
99 
Weights 
10 



10 


10 



10 
10 


10 




(a)    (5% 送 分 題 )      Learn   Genetic            Algorithm       (GA)    from    the       internet            video https://www.youtube.com/watch?v=kHyNqSnzP8Y      or https://www.youtube.com/watch?v=Fdk7ZKJHFcI .

(b)   (15%) Develop Tabu Search (TS) algorithm to solve the problem. Show your design and the “result”.

(c)    (15%) Develop Genetic Algorithm (GA) to solve the problem. Show your design and the

“result”.

(d)   (5%) Please give a comparison between Tabu and GA. You may try different parameters to see the change of the results (i.e., sensitivity analysis) in your developed algorithm. What’s the “insight” or interesting things you found?

More products