Amazon (MLS-C01) Exam Questions And Answers page 14
A Machine Learning Specialist is working with a large cybersecurity company that manages security events in real time for companies around the world. The cybersecurity company wants to design a solution that will allow it to use machine learning to score malicious events as anomalies on the data as it is being ingested. The company also wants be able to save the results in its data lake for later processing and analysis.
What is the MOST efficient way to accomplish these tasks?
What is the MOST efficient way to accomplish these tasks?
Ingest the data into Apache Spark Streaming using Amazon EMR, and use Spark MLlib with k-means to perform anomaly detection. Then store the results in an Apache Hadoop Distributed File System (HDFS) using Amazon EMR with a replication factor of three as the data lake.
Ingest the data and store it in Amazon S3. Use AWS Batch along with the AWS Deep Learning AMIs to train a k-means model using TensorFlow on the data in Amazon S3.
Ingest the data and store it in Amazon S3. Have an AWS Glue job that is triggered on demand transform the new data. Then use the built-in Random Cut Forest (RCF) model within Amazon SageMaker to detect anomalies in the data.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric. This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours.
With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input hyperparameter range(s).
Which visualization will accomplish this?
With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input hyperparameter range(s).
Which visualization will accomplish this?
A histogram showing whether the most important input feature is Gaussian.
A scatter plot with points colored by target variable that uses t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the large number of input variables in an easier-to-read dimension.
A scatter plot showing the performance of the objective metric over each training iteration.
A scatter plot showing the correlation between maximum tree depth and the objective metric.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. The dataset contains more than 800,000 records stored as plaintext CSV files. Each record contains 200 columns and is approximately 1.5 MB in size. Most queries will span 5 to 10 columns only.
How should the Machine Learning Specialist transform the dataset to minimize query runtime?
How should the Machine Learning Specialist transform the dataset to minimize query runtime?
Convert the records to Apache Parquet format.
Convert the records to JSON format.
Convert the records to GZIP CSV format.
Convert the records to XML format.
Exploratory Data Analysis
Machine Learning Implementation and Operations
A machine learning specialist needs to analyze comments on a news website with users across the globe. The specialist must find the most discussed topics in the comments that are in either English or Spanish.
What steps could be used to accomplish this task? (Choose two.)
What steps could be used to accomplish this task? (Choose two.)
Use an Amazon SageMaker BlazingText algorithm to find the topics independently from language. Proceed with the analysis.
Use an Amazon SageMaker seq2seq algorithm to translate from Spanish to English, if necessary. Use a SageMaker Latent Dirichlet Allocation (LDA) algorithm to find the topics.
Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon Comprehend topic modeling to find the topics.
Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon Lex to extract topics form the content.
Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon SageMaker Neural Topic Model (NTM) to find the topics.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist needs to be able to ingest streaming data and store it in Apache Parquet files for exploration and analysis.
Which of the following services would both ingest and store this data in the correct format?
Which of the following services would both ingest and store this data in the correct format?
AWS DMS
Amazon Kinesis Data Streams
Amazon Kinesis Data Firehose
Amazon Kinesis Data Analytics
Data Engineering
Exploratory Data Analysis
A Machine Learning Specialist needs to move and transform data in preparation for training. Some of the data needs to be processed in near-real time, and other data can be moved hourly. There are existing Amazon EMR MapReduce jobs to clean and feature engineering to perform on the data.
Which of the following services can feed data to the MapReduce jobs? (Choose two.)
Which of the following services can feed data to the MapReduce jobs? (Choose two.)
AWS DMS
Amazon Kinesis
AWS Data Pipeline
Amazon Athena
Amazon ES
Data Engineering
Model Development
A Machine Learning Specialist prepared the following graph displaying the results of k-means for k = [1..10]:
Considering the graph, what is a reasonable selection for the optimal choice of k?
Considering the graph, what is a reasonable selection for the optimal choice of k?
1
4
7
10
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist prepared the following graph displaying the results of k-means for k = [1..10]:
Considering the graph, what is a reasonable selection for the optimal choice of k?
Considering the graph, what is a reasonable selection for the optimal choice of k?
1
4
7
10
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist previously trained a logistic regression model using scikit-learn on a local machine, and the Specialist now wants to deploy it to production for inference only.
What steps should be taken to ensure Amazon SageMaker can host a model that was trained locally?
What steps should be taken to ensure Amazon SageMaker can host a model that was trained locally?
Build the Docker image with the inference code. Tag the Docker image with the registry hostname and upload it to Amazon ECR.
Serialize the trained model so the format is compressed for deployment. Tag the Docker image with the registry hostname and upload it to Amazon S3.
Serialize the trained model so the format is compressed for deployment. Build the image and upload it to Docker Hub.
Build the Docker image with the inference code. Configure Docker Hub and upload the image to Amazon ECR.
Machine Learning Implementation and Operations
AWS Machine Learning Services
A Machine Learning Specialist receives customer data for an online shopping website. The data includes demographics, past visits, and locality information. The Specialist must develop a machine learning approach to identify the customer shopping patterns, preferences, and trends to enhance the website for better service and smart recommendations.
Which solution should the Specialist recommend?
Which solution should the Specialist recommend?
Latent Dirichlet Allocation (LDA) for the given collection of discrete data to identify patterns in the customer database.
A neural network with a minimum of three layers and random initial weights to identify patterns in the customer database.
Collaborative filtering based on user interactions and correlations to identify patterns in the customer database.
Random Cut Forest (RCF) over random subsamples to identify patterns in the customer database.
Model Development
Machine Learning Implementation and Operations
Comments