Starting from:

$20.99

CSCI544- Applied Natural Language Processing: Homework 1 Solved

Overview
Person names in the English language typically consist of one or more forenames followed by one or more surnames

(optionally preceded by zero or more titles and followed by zero or more suffixes). This situation can create ambiguity, as it is often unclear whether a particular name is a forename or a surname. For example, given the sequence Imogen and Andrew Lloyd Webber, it is not possible to tell what the full name of Imogen is, since that would depend on whether Lloyd is part of Andrew’s forename or surname (as it turns out, it is a surname: Imogen Lloyd Webber is the daughter of Andrew Lloyd Webber). This exercise explores ways of dealing with this kind of ambiguity.

You will write a program that takes a string representing the names of two persons (joined by and), and tries to predict the full name of the first person in the string. To develop your program, you will be given a set of names with correct solutions: these are not names of real people – rather, they have been constructed based on lists of common forenames and surnames. The names before the and are the first person’s forenames, any titles they may have, and possibly surnames; the names after the and are the second person’s full name. For each entry, your program will output its best guess as to the first person’s full name. The assignment will be graded based on accuracy, that is the number of names predicted correctly on an unseen dataset constructed the same way.

Data
A set of development and test data is available as a compressed ZIP archive on Blackboard. The uncompressed archive contains the following files:

 A test file with a list of conjoined names.

A key file with the same list of conjoined names, and the correct full name of the first person for each example.

Readme and license files, which will not be used for the exercise, but may contain useful information.

The submission script will run your program on the test file and compare the output to the key file. The grading script will do the same, but on a different pair of test and key files which you have not seen before.

Program
You will write a program called full-name-predictor.py in Python 3 (Python 2 has been deprecated), which will take the path to the test file as a command-line argument. Your program will be invoked in the following way:

> python full-name-predictor.py /path/to/test/data

The program will read the test data, and write its answers to a file called full-name-output.csv. The output file must be in the same format of the key file.

More products