Starting from:

$25

CSC4760 - Big Data Programming - Assignment 3  - Solved

Setting up Spark and running the WordCount example

This assignment aims at letting you learn how to setup Spark on your KVM. After the installation of Spark, you need to run the WordCount (Python version) example on your KVM.

Source Code and Datasets: 

The Python source code is given in the file “WordCount.py”. You need to run it on two datasets:

1) test.txt                   (display the top-5 most frequent words) 2) peterpan.txt         (display the top-30 most frequent words)

The example commands are as follows.

$ spark-submit  WordCount.py  /home/rob/Assignment3/test.txt  5 $ spark-submit  WordCount.py  /home/rob/Assignment3/peterpan.txt  30

Report: 

Please write a report to explain the key steps. Please take the screenshots of the outputs in the terminal for “test.txt” and “peterpan.txt” respectively. Please put them in the report and explain the outputs briefly. You may include the following key steps.

1)  Setup Spark in KVM by yourself.

2)  Download the “WordCount.py” file and two input data files from iCollege.

3)  Open a terminal, and run the “WordCount.py” file on “test.txt” and “peterpan.txt” respectively. You need to explain the commands and the outputs.


More products