$25
Complete the associated exercises before attempting the assignment
Aim
This assignment and its associated exercises provide experience in the use of libraries and more complex program build processes.
Objectives
On completion of this assignment and its associated exercise, you will be able to:
• Explain the differences between static and dynamic linking
• Link code that you have written with supplied libraries.
• Create makefiles
Task domain for example application – generating a graphic CAPTCHA
You will often have encountered CAPTCHAs when using the web. They are intended to block features of web-sites from automated bots, spiders, scanners, scripts etc. The web server displays a puzzle that the user must solve before they can advance to the controlled web resource. The puzzle is displayed in a web page, the user’s “solution” is sent back to the server for checking; only if the
solution is correct may the user advance to the controlled web resource.
“A CAPTCHA (an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge-response test used in computing to determine whether or not the user is human.
The term was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper of Carnegie Mellon University and John Langford of IBM. The most common type of CAPTCHA was first invented by Mark D. Lillibridge, Martin Abadi, Krishna Bharat and Andrei Z. Broder. This form of CAPTCHA requires that
the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen.” (So says Wikipedia.)
Wikipedia adds: “This user identification procedure has received many criticisms, especially from disabled people, but also from other people who feel that their everyday work is slowed down by distorted words that are illegible even for users with no disabilities at all.”
Apart from being a pain for humans solve, these distorted letter patterns are increasingly vulnerable.
“However, our research recently showed that today’s Artificial Intelligence technology can solve even the most difficult variant of distorted text at 99.8% accuracy. Thus distorted text, on its own, is no longer a dependable test.” http://googleonlinesecurity.blogspot.com.au/2014/12/are-you-robotintroducing-
no-captcha.html
A number of organisations have created alternative “picture recognition” based CAPTCHAs. For example, Microsoft created the Asirra CAPTCHA:
"Asirra is easy for users; it can be solved by humans 99.6% of the time in under 30 seconds.
Anecdotally, users seemed to find the experience of using Asirra much more enjoyable than a textbased
CAPTCHA."
There are a number of similar projects –
Confidentcaptcha.com
Piccaptcha.com
Google’s own version:
While currently more secure than corrupted text, these picture based approaches are still vulnerable.
That collection of 2 million dog and cat photos – only 2 million, hackers have infinite time and can it seems map out the space of picture files onto dog | cat. Other attacks may also be possible -
https://www.linkedin.com/pulse/20140417144957-237781962-cracking-ms-asirra-captchas-withgoogle- repost-from-blog .
And these picture based approaches are an excellent area for PhD students to create thesis projects for recognising images - http://epub.uni-regensburg.de/16872/1/trustbus_1.pdf .
So what you will be doing in this exercise is creating a component for a slightly more challenging variant!
This variant works as follows:
• For every CAPTCHA test the server web server will generate a complex and quite unique image that has embedded in it a set of sub-images of a given type, it records the position of these sub-images for use when verifying the users response. (This version uses a fixed number of sub-images, but could easily be generalised.)
It sends this unique generated image to the client’s browser.
• The page sent to the browser incorporates the image and some Javascript code.
The CAPTCHA test requires the user to click on the embedded sub-images; the positions of
the clicks are captured by Javascript and sent in a verification request using AJAX.
• The server receives the user input and checks that the user clicked within the areas of the sub-images. If the user input is valid, the server creates session data “not a bot” that will allow the user to reach controlled web resources.
As shown in the following examples, the overall image is comprised of a background photo (or abstract patterned image) and a large number of embedded partially transparent sub-images. These sub-images are taken from several different collections. One group of sub-images constitutes the target for the user – as identified by a different sub-image of similar type.
The code that you write for this exercise is the C code for firstly creating an image collection, and then generating HTML pages with images along with the files containing the associated data that define the position of the embedded sub-images that the user is to identify. (Your HTML pages don’t include the Javascript that would be required; this code would be added via a HTML <script link.)
Task
The application
You are to build the application firstly as a NetBeans project, and then when it works you are to create a standalone version with your own makefile.
Another menu-select program!
This version of the program is simply an exercise and incorporates both the code to build up the image collection and the code to generate puzzles. (A realistic implementation would split these aspects into different applications.)
Example Use
Generate a puzzle:
• This version of the program is to generate a log that specifies the image selected as a background, the types of sub-image to embed (at least 5 different types should be used in each generated puzzle image), the specific sub-images selected (at least 3 from each different image type), and also identifies the sub-images that the user must select.
• In this case, the background bkgd6.jpg was used (backgrounds can be jpg, but .png must be used for sub-images as these require transparency data).
• The target sets were Butterfly, Aircraft, Steam-engine, Statue, and Car; 3 pictures were picked from each set. “Statue” was the set selected, and an additional statue sub-image is used. (Target sets are picked randomly from the set of all possible target types. Image
files are picked randomly from the set associated with chosen type.)
This generated the HTML page: along with a file with the coordinates for the bounding rectangles for the target sub-images, something similar to the following:
The generated HTML file contains the base-64 encoded version of the images:
Add another background image:
The program prompts for the filename, reads in the image file, scales the image to a standard width (about 700px), and saves the background as a .jpg file in a “backgrounds” directory. The Redis database is also updated. The Redis database has a counter for background images (used to generate a name for the background image file in the backgrounds directory), and a set “bkgrdimgs”
whose members are the names of the files with backgrounds.
The scaled picture of Venice became the 10th image in the backgrounds collection. (I used photos as backgrounds, but in practice it might be better to use abstract images composed of multiple overlayed figures and lines in many different colours.)
Add another “target”:
In my implementation, sub-images are referred to as “targets”. The program is given the name of an image file with the additional sub-image. The image is read. It is then scaled to a fixed 100px width.
It is then made partially transparent by adjusting the alpha values for each pixel; the transparency makes the targets merge into the background rather than existing as recognizably distinct areas.
The scaled, partially transparent image is then saved to a file in a “targets” directory. Here, .png format must be used so as to preserve alpha channel data.
The program also gets the user to assign a “tag” for the image. This tag is used to group similar images.
As targets are added, records in the Redis database are updated. There is a target counter; this is used to generate unique names for the .png files created in the ‘targets’ directory. There is a set that contains the names for all distinct tags – “Fish”, “Car”, “Locomotive”, “Statue”, “Butterfly”, …
Each tag is associated with a set – the identifiers of the target files given that tag.
The following view of the Redis database shows that there were ten backgrounds and 68 different target images. The” tags” set contains the names of the different target groups. Each group has a set of associated images; the contents of the set “Cat” is shown. The puzzleid gets incorporated in the generated HTML page and is also used to name the file that is used to hold the bounding rectangles of the sub-image targets.
Partial transparency
The target images have alpha channel transparency values defined.
The transparency is handled somewhat naively by adjusting each pixel in the image. (There may be a better way to do this, but the gd documentation was not helpful). The outer region is made more transparent than the centre:
The code is along the following lines – Embedding images in background
Load the sub-image from the targets directory, use gdImageCopy() to position sub-image at some random position within the main puzzleImage.
Creating directories Structure of my version of the program
Number of images
You will need a minimum of 3 different background images, and five different target sets (tag entries). For each target set, you will need at least five scaled, partially transparent images. Your program should refuse to generate HTML pages if there are too few images available.
Submission
Prepare your report and convert to PDF format as the file A4.pdf.
Submission is done electronically via a program called turnin that runs on “banshee” – the main
CS undergraduate machine. You must first transfer your A4.pdf file to your home directory on banshee (this is different from the home directory that you access on the Linux machines). You can transfer the file using a SSH file-transfer program. The Ubuntu OS allows you to open a file-browser
connected to your banshee home directory – and you can simply drag your A4.pdf file across using the visual file browser.
For CSCI131, assignments are submitted electronically via the turnin system. For this assignment you submit your assignment via the command:
turnin -c csci131 -a 4 A4.pdf
Late submissions would be submitted as:
turnin -c csci131 -a 4late A4.pdf
The program turnin only works when you are logged in to the main banshee undergraduate server machine. From an Ubuntu workstation in the lab, you must open a terminal session on the local machine, and then login to banshee via ssh and run the turnin program.
Marking
The assignment is worth 6 marks total.
• Appearance and structure of report: 1 mark
• Evidence for correct operation: 1 mark
• Code and explanations of your implementation (and also your makefile for your standalone
version): 4 marks total