Starting from:

$30

CT5132/CT5148 Lab Week 9 Solved




Regular expressions and web scraping
In lectures we studied regular expressions and used <regex101.com to test regexs interatively. Now let’s practice using them in Python.

 

Figure 1: On https://nationalservice.ichec.ie/login/login.php, there is a list of all the ICHEC projects, Classes

A, B and C.

We can use Ctrl-A, Ctrl-C, Ctrl-V to put this data in a text file: data/ichec_projects_scrape.txt.

However, it is now unstructured plain text. Let’s use regular expressions to extract the project codes. Each code is like ngcom018c or ulphy033a.

1.   import re

2.   Read the data: s = open("../data/ichec_projects_scrape.txt").read()

3.   Write a pattern p to match codes (maybe test on regex101.com)

4.   Call a Python re function to find all the project codes.

5.   Notice that the codes seem to have a specific encoding: ngcom018c is NUI Galway, Computer Science, 18, Class-C. ulphy033a is University of Limerick, Physics, 033, Class-A. Use grouping ( ) to extract the four individual parts in each code. Using this, how many Class-C Computer Science projects are there across all universities?

6.   Write a new pattern to match only NUI Galway projects, and test it.

(Solutions: code/count_ichec_projects.py.)

Generative art using grammars
We already have the following code which will generate an image given a string (the string representing an arithmetic expression). Notice here we are using x[0] and x[1] to represent the two axes (not x and y as in the notebook).

import numpy as np

import matplotlib.pyplot as plt import matplotlib.cm as cm

n = 200

xs = np.linspace(0, 1, n) ys = np.linspace(0, 1, n)

x = np.meshgrid(xs, ys) # x contains x[0] and x[1]

ps = "np.sin(40 * x[0]) * np.sin(30 * (x[1]+0.5)) * x[0] * x[1]" p = eval("lambda x: " + ps)

plt.imshow(p(x)) plt.axis('off') plt.show()
 

7.     Change ps to make cooler/more complex images.

We also have the following code which will derive a new string we can use instead of ps:

from grammar import Grammar # assume we are in code/ directory fname = "arithmetic.bnf" g = Grammar(file_name=fname) ps = g.derive_string() print(ps)

8.     Use this to generate several images. If you sometimes see the error TypeError: Invalid shape () for image data, that’s probably because the grammar generated a string like 0, i.e. a constant. There are ways to work around this, but we can just ignore it and generate a new one.

9.     If you like, put everything in a convenient function or in a loop to make the process of trying new ones quicker.

10.   Change arithmetic.bnf to allow some cooler/more complex images. Post your best images on the Discussion Board.

Optional ideas: try different colour maps (see matplotlib.cm), or create polar coordinate variables

(r,θ).

11.   Optional. Take a look at derive_string(), defined in grammar.py, to see the implementation of the simple algorithm that we defined in lectures for deriving a string from a grammar.

More products