$25
Before the lab you should re-read the relevant lecture slides and their accompanying examples.
Create a new directory for this lab called lab01, change to this directory, and fetch the provided code for this week by running these commands:
$ mkdir lab01
$ cd lab01
$ 2041 fetch lab01
Or, if you're not working on CSE, you can download the provided code as a zip file or a tar file.
There is a template file named dictionary_answers.txt which you must use to enter the answers for this exercise.
Download dictionary_answers.txt, or copy it to your CSE account using the following command:
$ cp -n /web/cs2041/20T2/activities/dictionary/dictionary_answers.txt .
The autotest scripts depend on the format of dictionary_answers.txt so just add your answers don't otherwise change the file. In other words edit dictionary_answers.txt:
gedit dictionary_answers.txt &
On most Unix systems you will find one or more dictionaries containing many thousands of words typically in the directories
/usr/share/dict/.
We've created a dictionary named dictionary.txt for this lab exercise.
Download dictionary.txt, or copy it to your CSE account using the following command:
$ cp -n /web/cs2041/20T2/activities/dictionary/dictionary.txt .
$ ls -l total 4
lrwxrwxrwx 1 cs2041 cs2041 ictionary.txt -> ../dictionary.txt
-rw-r--r-- 1 cs2041 cs2041 1072 May 26 10:36 dictionary_answers.txt
. Write an egrep command that prints the words in dictionary.txt which contain in characters "lmn" consecutively.
Hint: it should print:
Selmner Selmner's almner almners calmness calmness's calmnesses
The COMP2041 class account contains a script named autotest that automatically runs 1 or more tests on your lab exercises. Once you have entered you answer for q1 you can check it like this:
2041 autotest dictionary q1
Test q1 (egrep '^Q1 answer' dictionary_answers.txt|tail -1|sed 's/.*answer[: ]*//'|sh) - passed 1 tests passed 0 tests failed
Passing the autotest doesn't guarantee your answer is correct, of course, but it may find a mistake you've missed so run autotest for each of the following questions when you've entered the answer in dictionary_answers.txt.
. Write a shell pipeline that prints the words that contain "zz", but do not end in apostrophe-s ('s)? Hint: it should print:
Abruzzi
Arezzini
Arezzo
Barozzi
Belshazzar
Brazzaville
Buzz
Buzzell
. Write an egrep command that prints the words that contain four consecutive vowels? Hint: it should print these words:
Aiea
Aiea's
Araguaia
Araguaia's
Douai
Douai's
Graeae
Graiae
. Write an egrep command that prints words which contain all 5 vowels "aeiou" in that order? The words may contain more than 5 vowels but they must contain "aeiou" in that order.
Hint: it should print these words:
abstemious abstemiously abstemiousness abstemiousness's abstemiousnesses abstentious adenocarcinomatous adventitious
. Write an egrep command that prints which contain the vowels "aeiou" in that order and no other vowels. Hint: it should print these words:
abstemious abstemiously abstentious arsenious caesious facetious facetiously
When you think your program is working, you can use autotest to run some simple automated tests:
$ 2041 autotest dictionary
Autotest Results
96% of 607 students who have autotested dictionary_answers.txt so far, passed all autotest tests.
99% passed test q1 q2
100% passed test q3 q4
98% passed test q
There is a template file named parliament_answers.txt which you must use to enter the answers for this exercise.
Download parliament_answers.txt, or copy it to your CSE account using the following command:
$ cp -n /web/cs2041/20T2/activities/parliament/parliament_answers.txt .
The autotest scripts depend on the format of parliament_answers.txt so just add your answers don't otherwise change the file.
In this exercise you will analyze a file named parliament.txt containing a list of the members of the Australian House of Representatives (MPs).
Download parliament.txt, or copy it to your CSE account using the following command:
$ cp -n /web/cs2041/20T2/activities/parliament/parliament.txt .
. Write an egrep command that will print all the lines in the file where the electorate begins with W.
Hint: it should print these lines:
Hon Scott Buchholz: Member for Wright, Queensland
Hon Tony Burke: Member for Watson, New South Wales
Mr Nick Champion: Member for Wakefield, South Australia
Mr Stephen Jones: Member for Whitlam, New South Wales
Mr Peter Khalil: Member for Wills, Victoria
Mr Llew O'Brien: Member for Wide Bay, Queensland
Dr Kerryn Phelps AM: Member for Wentworth, New South Wales
Ms Anne Stanley: Member for Werriwa, New South Wales
Ms Zali Steggall OAM: Member for Warringah, New South Wales Hon Dan Tehan: Member for Wannon, Victoria
. Write an egrep command that will list all the lines in the file where the MP's first name is Andrew. Hint: it should print these words:
Hon Andrew Gee: Member for Calare, New South Wales
Mr Andrew Giles: Member for Scullin, Victoria
Mr Andrew Hastie: Member for Canning, Western Australia
Mr Andrew Laming: Member for Bowman, Queensland
Hon Dr Andrew Leigh: Member for Fenner, Australian Capital Territory
Mr Andrew Wallace: Member for Fisher, Queensland Mr Andrew Wilkie: Member for Denison, Tasmania
. Write an egrep command that will print all the lines in the file where the MP's surname (last name) ends in the letters 'll'. Hint: it should print these words:
Mr Julian Hill: Member for Bruce, Victoria
Mr Brian Mitchell: Member for Lyons, Tasmania
Mr Rob Mitchell: Member for McEwen, Victoria
Ms Zali Steggall OAM: Member for Warringah, New South Wales
. Write an egrep command that will print all the lines in the file where the MP's name and the electorate ends in the letter 'y'.
Hint: it should print these lines:
Mr Rowan Ramsey: Member for Grey, South Australia
. Write an egrep command that will print all the lines in the file where the MP's name or the electorate ends in the letter 'y'.
Hint: it should print these lines:
Dr Anne Aly: Member for Cowan, Western Australia
Hon Linda Burney: Member for Barton, New South Wales
Mr Pat Conroy: Member for Shortland, New South Wales
Mr Chris Crewther: Member for Dunkley, Victoria
Mr Milton Dick: Member for Oxley, Queensland
Hon Damian Drum: Member for Murray, Victoria
Ms Nicolle Flint: Member for Boothby, South Australia
Hon Ed Husic: Member for Chifley, New South Wales
Hon Bob Katter: Member for Kennedy, Queensland
Ms Ged Kearney: Member for Batman, Victoria
Mr Craig Kelly: Member for Hughes, New South Wales
Hon Dr Mike Kelly AM: Member for Eden-Monaro, New South Wales
Hon Michelle Landry: Member for Capricornia, Queensland
Hon Sussan Ley: Member for Farrer, New South Wales
Mrs Melissa McIntosh: Member for Lindsay, New South Wales
Hon Ben Morton: Member for Tangney, Western Australia
Mr Llew O'Brien: Member for Wide Bay, Queensland
Hon Tanya Plibersek: Member for Sydney, New South Wales
Mr Rowan Ramsey: Member for Grey, South Australia
Ms Michelle Rowland: Member for Greenway, New South Wales
Hon Tony Smith: Member for Casey, Victoria
. Write an egrep command to print all the lines in the file where there is any word in the MP's name or the electorate name that ends in ng.
Hint: it should print these lines:
Mr John Alexander OAM: Member for Bennelong, New South Wales
Hon Josh Frydenberg: Member for Kooyong, Victoria
Mr Luke Gosling OAM: Member for Solomon, Northern Territory
Mr Andrew Hastie: Member for Canning, Western Australia
Hon Catherine King: Member for Ballarat, Victoria
Ms Madeleine King: Member for Brand, Western Australia
Mr Andrew Laming: Member for Bowman, Queensland
Hon Bill Shorten: Member for Maribyrnong, Victoria
Mr Terry Young: Member for Longman, Queensland
. Write an egrep command that will print all the lines in the file where the MP's surname (last name) both begins and ends with a vowel.
Hint: it should print these lines:
Hon Anthony Albanese: Member for Grayndler, New South Wales Ms Cathy O'Toole: Member for Herbert, Queensland
. Most electorate have names that are a single word, e.g. Warringah, Lyons & Grayndler. A few electorates have multiple word names, for example, Kingsford Smith. Write an egrep command that will print all the lines in the file where the electorate name contains multiple words (separated by spaces or hyphens). Hint: it should print these lines:
Hon Mark Butler: Member for Port Adelaide, South Australia
Hon Barnaby Joyce: Member for New England, New South Wales
Hon Dr Mike Kelly AM: Member for Eden-Monaro, New South Wales
Mr Llew O'Brien: Member for Wide Bay, Queensland
Hon Matt Thistlethwaite: Member for Kingsford Smith, New South Wales
Hon Jason Wood: Member for La Trobe, Victoria
Mr Trent Zimmerman: Member for North Sydney, New South Wales
. Write a shell pipeline which prints the 8 Australians states & territory in order of the number of MPs they have. It should print only the number of MPs, followed by the name of the states/territories. It should print them from fewest to most MPs.
Hint: check out the Unix filters cut, sort, uniq in the lecture notes.
Hint: it should print these lines:
1 Australian Capital Territory
2 Northern Territory
5 Tasmania
9 South Australia
15 Western Australia
27 Queensland
33 Victoria
45 New South Wales
. Challenge: The most common first name for an MP is Andrew. Write a shell pipeline which prints the 2nd most common MP first name. It should print this first name and only this first name.
Hint: check out the Unix filters cut, sort, sed, head, tail & uniq in the lecture notes.
Hint: it should print this line:
Tony
When you think your program is working, you can use autotest to run some simple automated tests:
$ 2041 autotest parliament
Autotest Results
83% of 592 students who have autotested parliament_answers.txt so far, passed all autotest tests. 99% passed test q1
85% passed test q10 100% passed test q2
99% passed test q3
100% passed test q4
98% passed test q5 99% passed test q6
98% passed test q7 q8
97% passed test q9
When you are finished working on this exercise, you must submit your work by running give:
There is a template file named ab_answers.txt which you must use to enter the answers for this exercise.
Download ab_answers.txt, or copy it to your CSE account using the following command:
$ cp -n /web/cs2041/20T2/activities/ab/ab_answers.txt .
Use egrep to test your answers to these questions.
Try to solve these questions using the standard regular expression language described in lectures.
. Write a egrep command that prints the lines in a file named input.txt containing containing at least one A and at least one B. For example:
Matching
Not Matching
Andrew's favourite Band is not
George is Brillant
ABBA
Andrew
BA
B
AB
A
So to test with egrep you might do this:
cat >input.txt <<eof
Andrew's favourite Band is not
George is Brillant
ABBA
Andrew
AB
BA
A B eof
egrep 'REGEXP' input.txt Andrew's favourite Band is not
ABBA
AB
BA
. Write a egrep command that prints the lines in a file named input.txt containing only the characters A and B such that all pairs of adjacent A's occur before any pairs of adjacent B's. In other words if there is pair of B's on the line , there can not be a pair of A's afterwards.
Matching
Not Matching
ABAABAABAABBBBABB
BBAA
ABBA
ABBAA
ABAAAAAAAAAABBA
ABBABABABABAA
ABABABABA
ABBBAAA
A
BBABABABABABABAA
. Write a egrep command that prints the lines in a file named input.txt containing only the characters A and B such that the number of A's is divisible by 4.
Matching
Not Matching
AAAA
AAAAA
BABABABAB
ABABBBBBBBBBBBBBBBAAA
AAAABBBBAAAA
AAAABBABBAAAA
BBBAABBBBBAABBBAAAA
BBBAABBABBBAABBBAAAA
. Write a egrep command that prints the lines in a file named input.txt containing only the characters A and B such that there are exactly n A's followed by exactly n B's and no other characters.
Matching
Not Matching
AAABBB
AAABB
AB
BA
AABB
AABBB
AAAABBBB
AAAABBBBA