Starting from:

$25

EE2703 - Week 3 -  Fitting Data to Models - Solved

This week’s Python assignment will focus on the following topics:

• Reading data from files and parsing them

–   Analysing the data to extract information

–   Study the effect of noise on the fitting process

–   Plotting graphs

Plotting graphs
You can use matplotlib to plot sophisticated graphs in Python.

Here is a simple program to plot J0(x) for 0 < x < 10. (type it in and see)

In [48]: from pylab import *; from scipy.special import *

In [49]: x=arange(0,10,.1)

In [50]: y=jv(0,x)

In [51]: plot(x,y)

In [52]: show()

The import keyword imports a module as already discussed. The “pylab” module is a super module that imports everything needed to make python seem to be like Matlab.

The actual code is four lines. One defines the x values. The second computes the Bessel function. The third plots the curve while the last line displays the graphic. The plot command has the following syntax: plot(x1,y1,style1,x2,y2,style2,...)

helping you to plot multiple curves in the same plot.

The Assignment
1. The site has a python script generate_data.py. Download the script and run it to generate a set of data. The data is written out to a file whose name is fitting.dat.

1          h* 1i≡

# script to generate data files for the least squares assignment from pylab import * import scipy.special as sp N=101 # no of data points

k=9          # no of sets of data with varying noise # generate the data points and add noise t=linspace(0,10,N)               # t vector y=1.05*sp.jn(2,t)-0.105*t # f(t) vector

Y=meshgrid(y,ones(k),indexing=’ij’)[0] # make k copies scl=logspace(-1,-3,k) # noise stdev n=dot(randn(N,k),diag(scl)) # generate k vectors yy=Y+n           # add noise to signal

# shadow plot plot(t,yy) xlabel(r’$t$’,size=20) ylabel(r’$f(t)+n$’,size=20) title(r’Plot of the data to be fitted’) grid(True) savetxt("fitting.dat",c_[t,yy]) # write out matrix to file show()


2. Load fitting.dat (look up loadtxt). The data consists of 10 columns. The first column is time, while the remaining columns are data. Extract these. 3. The data columns correspond to the function

f(t)= 1.05J2(t)−0.105t +n(t)

with different amounts of noise added. What is noise? For us here, it is random fluctuations in the value due to many small random effects. The noise is given to be normally distributed, i.e., its probability distribution is given by

Pr 

with σ given by sigma=logspace(-1,-3,9)

Plot the curves in Figure 0 and add labels to indicate the amount of noise in each. (Look up plot and legend.)

4.       We want to fit a function to this data. Which function? The function has the same general shape asthe data but with unknown coefficients:

g(t;A,B)= AJ2(t)+Bt

Create a python function g(t,A,B) that computes g(t;A,B) for given A and B. Plot it in Figure 0 for A = 1.05, B = −0.105 (this should be labelled as the true value.)

5.       Generate a plot of the first column of data with error bars. Plot every 5th data item to make the plotreadable. Also plot the exact curve to see how much the data diverges.

This is not difficult. Suppose you know the standard deviation of your data and you have the data itself, you can plot the error bars with red dots using

errorbar(t,data,stdev,fmt=’ro’)

Here, ’t’ and ’data’ contain the data, while ’stdev’ contains σn for the noise. In order to show every fifth data point, you can instead use

errorbar(t[::5],data[::5],stdev,fmt=’ro’)

After plotting the errorbars, plot the exact curve using the function written in part 4, and annotate the graph.

6.       For our problem, the values of t are discrete and known (from the datafile). Obtain g(t,A,B) as a column vector by creating a matrix equation:

                                                           g       B          M p

J2(tm) tm

Construct M and then generate the vector M   and verify that it is equal to g(t,A0,B0).

How will you confirm that two vectors are equal?

Note: To construct a matrix out of column vectors, create the column vectors first and then use c_[...]. For instance, if you have two vectors x and y, use M=c_[x,y]

7.       For A = 0,0.1,...,2 and B = −0.2,−0.19,...,0, for the data given in columns 1 and 2 of the file, compute

                                                                                     1 101                                                    2

εij = 101  k∑=0(fk −g(tk,Ai,Bj))

This is known as the “mean squared error” between the data (fk) and the assumed model. Use the first column of data as fk for this part.

8.       Plot a contour plot of εij and see its structure. Does it have a minimum? Does it have several?

9.       Use the Python function lstsq from scipy.linalg to obtain the best estimate of A and B. The array you created in part 6 is what you need. This is sent to the least squares program.

10.   Repeat this with the different columns (i.e., columns 1 and i). Each column has the same function above, with a different amount of noise added as mentioned above. Plot the error in the estimate of A and B for different data files versus the noise σ. Is the error in the estimate growing linearly with the noise?

11.   Replot the above curves using loglog. Is the error varying linearly? What does this mean?

Linear Fitting to Data
Perhaps the most common engineering use of a computer is the modelling of real data. That is to say, some device, say a tachometer on an induction motor, or a temperature sensor of an oven or the current through a photodiode provides us with real-time data. This data is usually digitised very early on in the acquisition process to preserve the information, leaving us with time sequences,

 

If the device has been well engineered, and if the sensor is to be useful in diagnosing and controlling the device, we must also have a model for the acquired data:

f(t; p1,..., pN)

For example, our model could be

f(t; p1, p2)= p1+ p2sin(πt2)

Our task is to accurately predict p1,..., pN given the real-time data. This would allow us to say things like, “Third harmonic content in the load is reaching dangerous levels”.

The general problem of the estimation of model parameters is going to be tackled in many later courses.

Here we will tackle a very simple version of the problem. Suppose my model is “linear in the parameters”, i.e.,

N

f(t; p1,..., pN)= ∑ piFi(t)

i=1

where Fi(t) are arbitrary functions of time. Our example above fits this model:

                                                                  p1, p2 :  parameters to be fitted
                                                                         F1 :     1

                                                                         F2 :      sin(πt2)

For each measurement, we obtain an equation:
 
 
= x1
 
= x2
                                                                        ...                  ...
...
  xM

Clearly the general problem reduces to the inversion of a matrix problem:

                                              F1(t1)          F2(t1) ...        FN

                                            F1(t2)          F2(t2) ...        FN



However, the number of parameters, N, is usually far less than the number of observations, M. In the absence of measurement errors and noise, any non-singular N ×N submatrix can be used to determine the coefficients pi. When noise is present, what can we do? The matrix equation now becomes

F  

where~n is the added noise.~x0 is the (unknown) ideal measurement, while~x is the actual noisy measurement we make. We have to assume something about the noise, and we assume that it is as likely to be positive as negative (zero mean) and it has a standard deviation σn. Clearly the above equation cannot be exactly satisfied in the presence of noise, since the rank of F is N wheras the number of observations is M .

We also make a very important assumption, namely that the noise added to each observation xi, namely ni, is “independent” of the noise added to any other observation. Let us see where this gets us.

We wish to get the “best” guess for ~p. For us, this means that we need to minimize the L2 norm of the error. The error is given by ε = F ·~p−~x

The norm of the error is given by

 2

i i

This norm can we written in matrix form as

(F ·~p−~x)T ·(F ·~p−~x)

which is what must be minimized. Writing out the terms

~pT FT F 

Suppose the minimum is reached at some . Then near it, the above norm should be greater than that minimum. If we plotted the error, we expect the surface plot to look like a cup. The gradient of the error at the minimum should therefore be zero. Let us take the gradient of the expression for the norm. We write FTF = M, and write out the error in “Einstein notation”:

error = piMijpj +xjxj − piFijTxj −xiFijpj

Here we have suppressed the summation signs over i and j. If an index repeats in an expression, it is assumed to be summed over. Differentiating with respect to pk, and assuming that ∂ pi/∂ pk =δik, we get

∂error

 

∂ pk
=
δkiMijpj + piMijδjk −δkiFijTxj −xiFijδjk = 0
 
=
Mkjpj + piMik −FkjTxj −xiFik
= ∑ Mkj  FkjTxj

                                                                     j                                                          j

Now the matrix M is symmetric (just take the transpose and see). So the equation finally becomes, written as a vector expression

∇error(~p)= 2 FTF  i.e.,

  FTF 

This result is very famous. It is so commonly used that scientific packages have the operation as a built in command. In Python, it is a library function called lstsq:

from scipy.linalg import lstsq p,resid,rank,sig=lstsq(F,x)

where p returns the best fit, and sig, resid and rank return information about the process. In Scilab or Matlab, it take the form:

p0 = F\x;

Here F is the coefficient matrix and x is the vector of observations. When we write this, Scilab is actually calculating

p0=inv(F’*F)*F’*x;

What would happen if the inverse did not exist? This can happen if the number of observations are too few, or if the vectors fi(xj) (i.e., the columns of F) are linearly dependent. Both situations are the user’s fault. He should have a model with linearly independent terms (else just merge them together). And he should have enough measurements.

p0 obtained from the above formula is a prediction of the exact parameters. How accurate can we expect p0 to be? If x were x0 we should recover the exact answer limited only by computer accuracy. But when noise is present, the estimate is approximate. The more noise, the more approximate.

Where did we use all those properties of the noise? We assumed that the noise in different measurements was the same when we defined the error norm. Suppose some measurements are more noisy. Then we should give less importance to those measurements, i.e., weight them less. That would change the formula we just derived. If the noise samples were not independent, we would need equations to explain just how they depended on each other. That too changes the formula. Ofcourse, if the model is more complicated things get really difficult. For example, so simple a problem as estimating the frequency of the following model

y = Asinωt +Bcosωt +n

is an extremely nontrivial problem. That is because it is no longer a “linear” estimation problem. Such problems are discussed in advanced courses like Estimation Theory. The simpler problems of correlated, non-uniform noise will be discussed in Digital Communication since that theory is needed for a cell phone to estimate the signal sent by the tower.

In this assignment (and in this course), we assume independent, uniform error that is normally distributed. For that noise, as mentioned above, Python already provides the answer:

p=lstsq(F,x)

Even with all these simplifications, the problem can become ill posed. Let us look at the solution again:

  FTF 

Note that we have to invert FTF. This is the coefficient matrix. For instance, if we were trying to estimate A and B of the following model:

y = Asinω0t +Bcosω0t

(not the frequency - that makes it a nonlinear problem), the matrix F becomes

 sinω

                                                                    F =        ...                ...       

                                                                                   sinω0tn        cosω0tn

Hence, FTF is a 2 by 2 matrix with the following elements

FT F  

Whether this matrix is invertible depends only on the functions sinω0t and cosω0t and the times at which they are sampled. For instance, if the tk = 2kπ, the matrix becomes

FTF  

since sinω0tk ≡ 0 for all the measurement times. Clearly this is not an invertible matrix, even though the functions are independent. Sometimes the functions are “nearly” dependent. The inverse exists, but it cannot be accurately calculated. To characterise this, when an inverse is calculated, the “condition number” is also returned. A poor condition number means that the inversion is not to be depended on and the estimation is going to be poor.

We will not use these ideas in this lab. Here we will do a very simple problem only, to study how the amount of noise affects the quality of the estimate. We will also study what happens if we use the wrong model.

More products