$35
Galaxy Velocities
Our model of the Universe is that it is an infinite space filled with a uniform “gas” of galaxies. To a first approximation we can think of these galaxies as moving in random directions with each component of the velocity being drawn from a Gaussian distribution (surprise!) centered on zero (so each component has an equal chance of being positive or negative) with some standard deviation σv. Our goal in this project is to determine σv.
We measure galaxy velocities by measuring the shift in the spectral lines of a galaxy from the wavelengths we expect, called the redshift z, a dimensionless quantity that can be measured very accurately. The speed of light times this redshift, cz, gives us the velocity at which the galaxy appears to be moving away or toward us, usually expressed in km/sec. However, the redshift is actually a combination of the doppler shift, due to the galaxy’s motion, which is what we want, and a cosmological redshift due to the light having travelled through an expanding of the Universe. Hubble’s law tells us that the part of the velocity due to the cosmological redshift is given by H0r, where r is the distance in megaparsecs (Mpc) (an Mpc is about 3 million light years) and H0 = 70km/sec/Mpc is Hubble’s constant. Thus if we can measure the redshift z and the distance r we can obtain the radial velocity via
v = cz − H0r (1)
Now, it turns out that measuring distance is difficult, and the uncertainty in distance measurements is 5-15%. For distant galaxies, where H0r v, this introduces crazy large uncertainties in the radial velocity. For example, suppose we measure the distance of a galaxy to be 100Mpc with an uncertainty of 10Mpc. Since r is multiplied by H0 in the formula, this translates into an uncertainty of 700km/sec in the velocity. Given that velocities of galaxies are typically around a few hundred km/sec, this means that the uncertainty in velocity can be greater than the velocity itself! Thus galaxy velocity data is typically only useful when you have many measurements that you can combine to reduce the uncertainty.
Note that we can only measure the radial velocity of a galaxy, i.e. it’s motion toward or away from us. For galaxies in different directions this will be a different component of the velocity vector ~v, but since we have good evidence that the Universe is isotropic (the same in all directions), then we should be able to treat the radial component of velocity the same as if it were the x, y, or z component. In other words, we would expect the radial component of velocity to be drawn from a Gaussian distribution centered on zero with standard deviation σv.
On the course Canvas site you will find the files galvel.dat and grpvel.dat, which contain galaxy velocities and galaxy group velocities respectively. You will be using the 1st and 2nd columns of these files, which are measured radial velocity of the ith galaxy, vi, and its measurement uncertainty σi, both of which are in units of km/sec. Galaxies with positive velocities are moving toward us and those with negative velocities are moving away from us. We can model each measured radial velocity as the sum of the actual velocity ui of the galaxy plus some noise, , where the noise i is drawn from a Gaussian distributions with standard deviation σi and the actual velocity ui is drawn from a Gaussian distribution with standard deviation σv. Thus the measured velocity vi is drawn from a Gaussian
distribution centered on zero with standard deviation given by pσv2 + σi2.
The problem of estimating σv is thus very similar to the problem of estimating the standard deviation of a set of numbers, with the differences that 1) we already know that the “true” average is zero, and 2) we need to account for the measurement uncertainty for each velocity. You should be able to convince yourself that the likelihood function for σv is
(2)
where we absorbed all the constants into A. Unfortunately, it’s too messy to find the maximum likelihood by taking
a derivative, so we will instead just plot the likelihood vs. σv and find the maximum likelihood value that way.
One complication for a large data set is that the product in the formula above can get so small that the computer will round it to zero. One solution is to calculate the log likelihood instead,
(3)
2
where C is a constant. First calculate lnL with C = 0 and find where the maximum occurs. This is your maximum likelihood value for σv. To plot the likelihood, choose C to be the negative of the maximum value of the lnL, so that the maximum value of lnL becomes zero. Then you can plot the Likelihood using this value of C and the likelihood at the peak will be 1. You can find the uncertainty in σv by fitting a Gaussian to the Likelihood peak and determining the width of the peak from the standard deviation of the Gaussian.
Now, in principle groups of galaxies should have a smaller σv than individual galaxies since their velocities are essentially averages of the velocities of their constituent galaxies. From your estimates of σv for the two datasets, calculate a confidence level that the two datasets have a different σv.
Finally, we can ask the question of how good our model is at describing the data. From the discussion above, the
quantities vi/pσv2 + σi2 should be drawn from a Gaussian distribution centered on zero with a standard deviation of 1. Using your maximum likelihood value of σv2, make a histogram of these values for both datasets. Plot on top of your histogram a Gaussian centered on zero with standard deviation of 1.
Give your result for σv for each dataset together with its uncertainties. A good name for σv is the velocity dispersion, which here means the spread of velocity values. Discuss how confidently you can say that the σv of the two data sets is different. To calculate this, find the standard deviation of the difference of the two values and determine how many standard deviations your value is from zero. Discuss how well the Gaussian model describes the data, i.e. how well does your histogram match a Gaussian centered on zero with a standard deviation of 1. If it doesn’t match, discuss possible reasons for the failure of the model. For example, galaxies falling into clusters of galaxies will have unexpectedly large velocities. Do you see evidence for these nonGaussian tails in your histogram?