Amazon (MLS-C01) Exam Questions And Answers page 10
A machine learning (ML) specialist must develop a classification model for a financial services company. A domain expert provides the dataset, which is tabular with 10,000 rows and 1,020 features. During exploratory data analysis, the specialist finds no missing values and a small percentage of duplicate rows. There are correlation scores of > 0.9 for 200 feature pairs. The mean value of each feature is similar to its 50th percentile.
Which feature engineering strategy should the ML specialist use with Amazon SageMaker?
Which feature engineering strategy should the ML specialist use with Amazon SageMaker?
Drop the features with low correlation scores by using a Jupyter notebook.
Apply anomaly detection by using the Random Cut Forest (RCF) algorithm.
Concatenate the features with high correlation scores by using a Jupyter notebook.
Exploratory Data Analysis
Model Development
A machine learning (ML) specialist needs to extract embedding vectors from a text series. The goal is to provide a ready-to-ingest feature space for a data scientist to develop downstream ML predictive models. The text consists of curated sentences in English. Many sentences use similar words but in different contexts. There are questions and answers among the sentences, and the embedding space must differentiate between them.
Which options can produce the required embedding vectors that capture word context and sequential QA information? (Choose two.)
Which options can produce the required embedding vectors that capture word context and sequential QA information? (Choose two.)
Amazon SageMaker seq2seq algorithm
Amazon SageMaker BlazingText algorithm in Skip-gram mode
Amazon SageMaker Object2Vec algorithm
Amazon SageMaker BlazingText algorithm in continuous bag-of-words (CBOW) mode
Combination of the Amazon SageMaker BlazingText algorithm in Batch Skip-gram mode with a custom recurrent neural network (RNN)
Model Development
Machine Learning Implementation and Operations
A machine learning (ML) specialist wants to create a data preparation job that uses a PySpark script with complex window aggregation operations to create data for training and testing. The ML specialist needs to evaluate the impact of the number of features and the sample count on model performance.
Which approach should the ML specialist use to determine the ideal data transformations for the model?
Which approach should the ML specialist use to determine the ideal data transformations for the model?
Add an Amazon SageMaker Debugger hook to the script to capture key metrics. Run the script as an AWS Glue job.
Add an Amazon SageMaker Experiments tracker to the script to capture key metrics. Run the script as an AWS Glue job.
Add an Amazon SageMaker Debugger hook to the script to capture key parameters. Run the script as a SageMaker processing job.
Add an Amazon SageMaker Experiments tracker to the script to capture key parameters. Run the script as a SageMaker processing job.
Model Development
Machine Learning Implementation and Operations
A machine learning (ML) specialist wants to secure calls to the Amazon SageMaker Service API. The specialist has configured Amazon VPC with a VPC interface endpoint for the Amazon SageMaker Service API and is attempting to secure traffic from specific sets of instances and IAM users. The VPC is configured with a single public subnet.
Which combination of steps should the ML specialist take to secure the traffic? (Choose two.)
Which combination of steps should the ML specialist take to secure the traffic? (Choose two.)
Add a VPC endpoint policy to allow access to the IAM users.
Modify the users' IAM policy to allow access to Amazon SageMaker Service API calls only.
Modify the security group on the endpoint network interface to restrict access to the instances.
Modify the ACL on the endpoint network interface to restrict access to the instances.
Add a SageMaker Runtime VPC endpoint interface to the VPC.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist at a company sensitive to security is preparing a dataset for model training. The dataset is stored in Amazon S3 and contains Personally Identifiable Information (PII).
The dataset:
• Must be accessible from a VPC only.
• Must not traverse the public internet.
How can these requirements be satisfied?
The dataset:
• Must be accessible from a VPC only.
• Must not traverse the public internet.
How can these requirements be satisfied?
Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC.
Create a VPC endpoint and apply a bucket access policy that allows access from the given VPC endpoint and an Amazon EC2 instance.
Create a VPC endpoint and use Network Access Control Lists (NACLs) to allow traffic between only the given VPC endpoint and an Amazon EC2 instance.
Create a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an Amazon EC2 instance
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist built an image classification deep learning model. However, the Specialist ran into an overfitting problem in which the training and testing accuracies were 99% and 75%, respectively.
How should the Specialist address this issue and what is the reason behind it?
How should the Specialist address this issue and what is the reason behind it?
The learning rate should be increased because the optimization process was trapped at a local minimum.
The dropout rate at the flatten layer should be increased because the model is not generalized enough.
The dimensionality of dense layer next to the flatten layer should be increased because the model is not complex enough.
The epoch number should be increased because the optimization process was terminated before it reached the global minimum.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist deployed a model that provides product recommendations on a company's website. Initially, the model was performing very well and resulted in customers buying more products on average. However, within the past few months, the Specialist has noticed that the effect of product recommendations has diminished and customers are starting to return to their original habits of spending less. The Specialist is unsure of what happened, as the model has not changed from its initial deployment over a year ago.
Which method should the Specialist try to improve model performance?
Which method should the Specialist try to improve model performance?
The model needs to be completely re-engineered because it is unable to handle product inventory changes.
The model's hyperparameters should be periodically updated to prevent drift.
The model should be periodically retrained from scratch using the original data while adding a regularization term to handle product inventory changes
The model should be periodically retrained using the original training data plus new data as product inventory changes.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the Specialist is ready to implement an end-to-end solution in AWS using Amazon SageMaker. The historical training data is stored in Amazon RDS.
Which approach should the Specialist use for training a model using that data?
Which approach should the Specialist use for training a model using that data?
Write a direct connection to the SQL database within the notebook and pull data in
Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook.
Move the data to Amazon DynamoDB and set up a connection to DynamoDB within the notebook to pull data in.
Move the data to Amazon ElastiCache using AWS DMS and set up a connection within the notebook to pull data in for fast access.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist has created a deep learning neural network model that performs well on the training data but performs poorly on the test data.
Which of the following methods should the Specialist consider using to correct this? (Choose three.)
Which of the following methods should the Specialist consider using to correct this? (Choose three.)
Decrease regularization.
Increase regularization.
Increase dropout.
Decrease dropout.
Increase feature combinations.
Decrease feature combinations.
Model Development
Machine Learning Implementation and Operations
A Machine Learning Specialist is applying a linear least squares regression model to a dataset with 1,000 records and 50 features. Prior to training, the ML Specialist notices that two features are perfectly linearly dependent.
Why could this be an issue for the linear least squares regression model?
Why could this be an issue for the linear least squares regression model?
It could cause the backpropagation algorithm to fail during training
It could create a singular matrix during optimization, which fails to define a unique solution
It could modify the loss function during optimization, causing it to fail during training
It could introduce non-linear dependencies within the data, which could invalidate the linear assumptions of the model
Model Development
Machine Learning Implementation and Operations
Comments