Amazon (MLS-C01) Exam Questions And Answers page 12
A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences. The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions.
Here is an example from the dataset:
Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Choose three.)
Here is an example from the dataset:
"The quck BROWN FOX jumps over the lazy dog.
Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Choose three.)
Normalize all words by making the sentence lowercase.
Remove stop words using an English stopword dictionary.
Correct the typography on "quck" to "quick.
One-hot encode all words in the sentence.
Tokenize the sentence into words.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist is deciding between building a naive Bayesian model or a full Bayesian network for a classification problem. The Specialist computes the Pearson correlation coefficients between each feature and finds that their absolute values range between 0.1 to 0.95.
Which model describes the underlying data in this situation?
Which model describes the underlying data in this situation?
A naive Bayesian model, since the features are all conditionally independent.
A full Bayesian network, since the features are all conditionally independent.
A naive Bayesian model, since some of the features are statistically dependent.
A full Bayesian network, since some of the features are statistically dependent.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist is designing a scalable data storage solution for Amazon SageMaker. There is an existing TensorFlow-based model implemented as a train.py script that relies on static training data that is currently stored as TFRecords.
Which method of providing training data to Amazon SageMaker would meet the business requirements with the LEAST development overhead?
Which method of providing training data to Amazon SageMaker would meet the business requirements with the LEAST development overhead?
Use Amazon SageMaker script mode and use train.py unchanged. Point the Amazon SageMaker training invocation to the local path of the data without reformatting the training data.
Use Amazon SageMaker script mode and use train.py unchanged. Put the TFRecord data into an Amazon S3 bucket. Point the Amazon SageMaker training invocation to the S3 bucket without reformatting the training data.
Rewrite the train.py script to add a section that converts TFRecords to protobuf and ingests the protobuf data instead of TFRecords.
Prepare the data in the format accepted by Amazon SageMaker. Use AWS Glue or AWS Lambda to reformat and store the data in an Amazon S3 bucket.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist is designing a system for improving sales for a company. The objective is to use the large amount of information the company has on users behavior and product preferences to predict which products users would like based on the users similarity to other users.
What should the Specialist do to meet this objective?
What should the Specialist do to meet this objective?
Build a combinative filtering recommendation engine with Apache Spark ML on Amazon EMR
Build a content-based filtering recommendation engine with Apache Spark ML on Amazon EMR
Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR.
Build a model-based filtering recommendation engine with Apache Spark ML on Amazon EMR
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist is developing a custom video recommendation model for an application. The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket. The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance.
Which approach allows the Specialist to use all the data to train the model?
Which approach allows the Specialist to use all the data to train the model?
Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.
Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to the instance. Train on a small amount of the data to verify the training code and hyperparameters. Go back to Amazon SageMaker and train using the full dataset
Use AWS Glue to train a model using a small subset of the data to confirm that the data will be compatible with Amazon SageMaker. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.
Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to train the full dataset.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs. The workflow consists of the following processes:
" Start the workflow as soon as data is uploaded to Amazon S3.
" When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon S3.
" Store the results of joining datasets in Amazon S3.
" If one of the jobs fails, send a notification to the Administrator.
Which configuration will meet these requirements?
" Start the workflow as soon as data is uploaded to Amazon S3.
" When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon S3.
" Store the results of joining datasets in Amazon S3.
" If one of the jobs fails, send a notification to the Administrator.
Which configuration will meet these requirements?
Use AWS Lambda to trigger an AWS Step Functions workflow to wait for dataset uploads to complete in Amazon S3. Use AWS Glue to join the datasets. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
Develop the ETL workflow using AWS Lambda to start an Amazon SageMaker notebook instance. Use a lifecycle configuration script to join the datasets and persist the results in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
Develop the ETL workflow using AWS Batch to trigger the start of ETL jobs when data is uploaded to Amazon S3. Use AWS Glue to join the datasets in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
Use AWS Lambda to chain other Lambda functions to read and join the datasets in Amazon S3 as soon as the data is uploaded to Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
Data Engineering
Machine Learning Implementation and Operations
A machine learning specialist is developing a proof of concept for government users whose primary concern is security. The specialist is using Amazon SageMaker to train a convolutional neural network (CNN) model for a photo classifier application. The specialist wants to protect the data so that it cannot be accessed and transferred to a remote host by malicious code accidentally installed on the training container.
Which action will provide the MOST secure protection?
Which action will provide the MOST secure protection?
Remove Amazon S3 access permissions from the SageMaker execution role.
Encrypt the weights of the CNN model.
Encrypt the training and validation dataset.
Enable network isolation for training jobs.
Model Development
Machine Learning Implementation and Operations
A machine learning specialist is developing a regression model to predict rental rates from rental listings. A variable named Wall_Color represents the most prominent exterior wall color of the property. The following is the sample data, excluding all other variables:
The specialist chose a model that needs numerical input data.
Which feature engineering approaches should the specialist use to allow the regression model to learn from the Wall_Color data? (Choose two.)
The specialist chose a model that needs numerical input data.
Which feature engineering approaches should the specialist use to allow the regression model to learn from the Wall_Color data? (Choose two.)
Apply integer transformation and set Red = 1, White = 5, and Green = 10.
Add new columns that store one-hot representation of colors.
Replace the color name string by its length.
Create three columns to encode the color in RGB format.
Replace each color name by its training set frequency.
Model Development
Machine Learning Implementation and Operations
A machine learning specialist is developing a regression model to predict rental rates from rental listings. A variable named Wall_Color represents the most prominent exterior wall color of the property. The following is the sample data, excluding all other variables:
The specialist chose a model that needs numerical input data.
Which feature engineering approaches should the specialist use to allow the regression model to learn from the Wall_Color data? (Choose two.)
The specialist chose a model that needs numerical input data.
Which feature engineering approaches should the specialist use to allow the regression model to learn from the Wall_Color data? (Choose two.)
Create three columns to encode the color in RGB format.
Apply integer transformation and set Red = 1, White = 5, and Green = 10.
Add new columns that store one-hot representation of colors.
Replace the color name string by its length.
Replace each color name by its training set frequency.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist is given a structured dataset on the shopping habits of a company s customer base. The dataset contains thousands of columns of data and hundreds of numerical columns for each customer. The Specialist wants to identify whether there are natural groupings for these columns across all customers and visualize the results as quickly as possible.
What approach should the Specialist take to accomplish these tasks?
What approach should the Specialist take to accomplish these tasks?
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot.
Run k-means using the Euclidean distance measure for different values of k and create an elbow plot.
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a line graph.
Run k-means using the Euclidean distance measure for different values of k and create box plots for each numerical column within each cluster.
Exploratory Data Analysis
Comments