Starting from:

$30

CSI301-Lab 3 Parsing Tweets with the String Class Solved

Twitter and similar micro-blogging platforms allow users to broadcast short messages to a potentially large audience. With Twitter, the messages are called “tweets” and consist of 140 characters or less. Tweets are often sent from mobile phones, and tweeters can choose whether their tweets are public or not.

The platforms allow people to communicate in ways that were not possible even a few years ago, and they have already had a profound effect on society. They even affect how society reacts to natural and manmade disasters—ordinary citizens can now broadcast timely and location specific information that is of vital importance to both emergency management agencies and members of the public. 

Sifting through the tweets for useful information is difficult, however. As a result, there have been proposals to manually add structure to tweets sent during disasters. One proposal is called Tweak the Tweet (TtT).[1] In TtT, information is marked using a “hashtag”  such as #loc (for location):

MT @carlseelye: #lovelandfire #Wind just switched and now the smoke is thick around our house #loc 40.352,-105.2045

Adding the structure makes it far easier for a computer to classify the tweets. 

 

In this lab, you will use methods of the String class to process messages similar to TtT tweets (the data is made from real TtT data but it has been altered to suit the lab). You will use the substring and other methods to pull out information from the text, manipulate it, and print it out to the screen. 

Lab Objectives
By the end of the lab, you should:

•        understand how String constants and String objects are represented in Java;

•        be able to declare, initialize, and assign variables of type String;

•        perform string processing using the methods of the String class;

•        concatenate strings (and other values) together using the '+' operator.

 

 

Prerequisites
The lab deals with material from Chapter 2 (specifically, the discussions of the String class). It assumes you know how define simple classes in Java, and that you can declare and assign values to variables.

Exercise – Parsing Tweets
The class ParseTheTweet will read in a single tweet from the keyboard, and so when writing your program, you will need to use the Scanner class in addition to the String class.  As usual, almost everything you write will go in the main method of the ParseTheTweet class. 

 

The tweets processed by the call all encode the following information using so-called “hashtags”:

The report type (#typ); some further detail (#det); a location (#loc) such as a street address; and latitude (#lat) and longitude (#lng). The type indicates the meaning of the tweet (i.e., whether it is a request for help or reports factual information), and the report detail provides additional information.  Each of the hashtags is followed by some value (such as an actual latitude or longitude value). 

 

When writing your code, you can assume that all of the tweets to process have the following format: 

 

#typ value; #det value; #loc value; #lat value; #lng value;

 

That is, they consist of a series of hashtags, each followed by a value, and each value followed by a semicolon (;). 9 sample tweets are provided at the end of this document.  You should test your code on them. 

 

Instructions  
1.      At the top of your source file, include a comment stating your Java class name, your name, the date, the program purpose, and containing the statement of academic honesty as you did in previous labs. 

2.      Use the Scanner class (as discussed in lecture) to read in a tweet entered by the user and store it in a String variable named tweet.

▪ Hint: Using the Scanner class involves 1) importing a package, 2) creating a 

Scanner object, and 3) using the appropriate method to read in a whole line of text.

3.      You will be splitting up (parsing) the information in the tweet into the 5 different types of information specified by the hashtags (type, location, detail, latitude, & longitude). Declare variables to store each of these pieces of information. Be sure to give these variables the appropriate data types and meaningful names.

4.      Now you actually need to divide the information in the tweet into the separate substrings of information. Declare 2 variables (start and finish) to hold the indices of where each substring starts and finishes. 

▪ Hint: Each substring starts with a hashtag and finishes with a semicolon ( Is there a

String class method that you can use to get the index of  “#typ” or a ‘;’ ?)

 

                             start               finish

                        

#typ value; #det value; #loc value; #lat value; #lng value;

 

5.      Once you have numbers assigned to start and finish, we want to discard the #tag and extract only the value. We know that the ‘;’ is where the value finishes – but we need to find the index where the actual value begins. Hint: our start variable currently points to the index of the “#” – and we know that all hashtag identifiers have the format, hashtag, 3 letters, and a space. Can we do simple math here to figure out the starting position of our value?

6.      Once we have the correct starting and ending positions for the value, we want to extract the substring at those positions and assign it to the appropriate variable declared in step 3.

§  The trim() method removes leading and trailing white spaces (if any) from a String. Use the trim() method on each resultant String. 

§  Hint: The trim() method returns a modified String, but does not alter the String object that calls it. Ensure you are (re-)assigning a String to the result of trim().

Example

 String original = "  text here   ";

   String trimmed = original.trim(); //trimmed contains "text here"  //At this point, the original string still contains -- "  text here   "

7.      After extracting the value encoded by each hashtag, we discard that part of the tweet String – we are finished with it and are ready to repeat these steps for the next hashtag. We can use the substring method to extract the substring of our tweet variable starting where the last hashtag finished ( Hint: we know it finishes at a semicolon – and we have that index stored in our finish variable, and we want to start right after that semicolon – Also, remember that if we pass the substring method only 1 value, it begins at that index and goes until the end of the String).

 

                                     DISCARD                                      SUBSTRING STORED IN TWEET

                   

 

 

9.      The type values come from a very small set of values, and we want to make them all caps. To do this, use the toUpperCase method of the String class. 

10.   We also want to ensure that the detail and location values are free of commas (which might pose an obstacle to further processing). Use the replace method of the String class to do this. Replace each comma with a single hyphen (-).

11.   Now use System.out and print or println statements to produce formatted output, as shown in the included examples.  

HINT: Use the escape sequence \t to include tabs as needed. 



[1] http://epic.cs.colorado.edu/?page_id=11

More products