Starting from:

$30

CSYE7245-Assignment 4 Solved

The goal of the first part of this assignment is to create API that

1. Anonmyizes the data through:

●      Masking

●      Anonymization

Then, building upon the Infrastructure for login and server less functions using Cognito in

Assignment 1, integrate the APIs so that

1.  Only authenticated users can call these APIs

2.  Use Amazon Step functions and Lamda functions to make it server less where feasible (This is a design decision; You may host servers and then call those APIs or call readily available APIs like Amazon Comprehend through lambda functions) Refer:

1.  Complete and submit the following tutorials:

https://aws.amazon.com/blogs/machine-learning/detecting-and-redacting-pii-using-amazon-co mprehend/

(Links to an external site.)

2.  Presidio: https://github.com/microsoft/presidio

 

1

Implementation:

-------------------

Create three APIs:

API 1: Access

This API should retrieve the EDGAR filings data from the S3 bucket

API 2: Named entity recognition

This API should take a link to a file on S3 and:

●      Call Amazon Comprehend OR Google OR Presidio or any tool of your choice to find entities. (You can define the list of entities or use the default ones like Name, SSN, Date etc.)

●      Store these on S3

API 3: Implement masking, and anonymization functions.

Note: You have to define the API so as to indicate which entities need to be masked, which needs to be anonymized. You also need to get the location of the file/files as input and output the files back to S3. You can choose a method of your choice!

Part 2:

In this part of the assignment, you will build upon the pre-processed (anomymized/masked data) and build a sentiment analysis model that could take the location of the anonymized file as an input and generate sentiments for each sentence.

To build this service, you need a Sentiment analysis model that has been trained on “labeled”, “Edgar” datasets. Note that you need to have labeled training data which means someone has to label the statements. We will use the IMDB dataset as a proxy and build a sentiment analyzer that can be tested on the anonymized datasets you prepared in the prior assignment

Preparation:

WHy TFX?

Read https://blog.tensorflow.org/2020/09/brief-history-of-tensorflow-extended-tfx.html#TFX for the history of TFX and MLOps Watch for an overview:

https://www.youtube.com/watch?v=YeuvR6m6ACQ&list=PLQY2H8rRoyvxR15n04JiW0ezF5HQRs_8F &index=1

Goal: To deploy a sentiment analysis model to create a Model-as-a-service for anonymized data

Step 1:Train TensorFlow models using TensorFlow Extended (TFX)
Replicate the architecture to train the model for the anonymized data using BERT and this architecture that leverages TensorFlow Hub, Tensorflow Transform, TensorFlow Data

Validation and Tensorflow Text and Tensorflow Serving

 

The pipeline takes advantage of the broad TensorFlow Ecosystem, including:

●      Loading the IMDB dataset via TensorFlow Datasets

●      Loading a pre-trained model via tf.hub

●      Manipulating the raw input data with tf.text

●      Building a simple model architecture with Keras

●      Composing the model pipeline with TensorFlow Extended, e.g. TensorFlow Transform, TensorFlow Data Validation and then consuming the tf.Keras model with the latest Trainer component from TFX

Ref:

https://blog.tensorflow.org/2020/03/part-1-fast-scalable-and-accurate-nlp-tensorflow-deploying-b ert.html https://blog.tensorflow.org/2020/06/part-2-fast-scalable-and-accurate-nlp.html Sample Code:

https://colab.research.google.com/github/tensorflow/workshops/blob/master/blog/TFX_Pipeline_f or_Bert_Preprocessing.ipynb#scrollTo=WWni3fVVafDa

Note: Use AlBert instead of BERT (https://tfhub.dev/google/albert_base/3)

Also, Note, you will be implementing sentiment analysis.. So you will have to change the dataset.

Use the IMDB dataset as a a proxy. See

(https://www.tensorflow.org/tutorials/keras/text_classification_with_hub ) for details.

Step 2: Serve the model as a REST API
Use the TENSORFLOW TFX RESTFUL API to serve the model (https://www.tensorflow.org/tfx/serving/api_rest ) See for sample code:

https://www.tensorflow.org/tfx/tutorials/serving/rest_simple

OR

https://towardsdatascience.com/serving-image-based-deep-learning-models-with-tensorflow-servi ngs-restful-api-d365c16a7dc4

OR

USE FAST API to serve the model See for sample code:

https://medium.com/python-data/how-to-deploy-tensorflow-2-0-models-as-an-api-service-with-fas tapi-docker-128b177e81f3

OR

https://testdriven.io/blog/fastapi-streamlit/

Step 3:Dockerize the API service.

For sample code on how to Dockerize API:

See https://www.tensorflow.org/tfx/serving/docker

OR https://medium.com/python-data/how-to-deploy-tensorflow-2-0-models-as-an-api-service-with-fas tapi-docker-128b177e81f3

Step 4: Build a Reference App in Streamlit to test the API
Input: Link to an anonymized/deanonymized file in Amazon S3 Output: Sentiment scores.

Sample Code:

See https://testdriven.io/blog/fastapi-streamlit/ for samples

Step 5: Write unit tests & Load tests to test your api
●       You will have to show test cases you have tested (using pytest)

●       Load test the API with Locust

More products