Amazon (MLS-C01) Exam Questions And Answers page 4
A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided.
Based on this information, which model would have the HIGHEST accuracy?
Based on this information, which model would have the HIGHEST accuracy?
Logistic regression
Support vector machine (SVM) with non-linear kernel
Single perceptron with tanh activation function
Model Development
Machine Learning Implementation and Operations
A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided.
Based on this information, which model would have the HIGHEST accuracy?
Based on this information, which model would have the HIGHEST accuracy?
Long short-term memory (LSTM) model with scaled exponential linear unit (SELU)
Logistic regression
Support vector machine (SVM) with non-linear kernel
Single perceptron with tanh activation function
Model Development
Machine Learning Implementation and Operations
A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided.
Based on this information, which model would have the HIGHEST recall with respect to the fraudulent class?
Based on this information, which model would have the HIGHEST recall with respect to the fraudulent class?
Decision tree
Linear support vector machine (SVM)
Naive Bayesian classifier
Single Perceptron with sigmoidal activation function
Model Development
Machine Learning Implementation and Operations
A company wants to create a data repository in the AWS Cloud for machine learning (ML) projects. The company wants to use AWS to perform complete ML lifecycles and wants to use Amazon S3 for the data storage. All of the company s data currently resides on premises and is 40 " in size.
The company wants a solution that can transfer and automatically update data between the on-premises object storage and Amazon S3. The solution must support encryption, scheduling, monitoring, and data integrity validation.
Which solution meets these requirements?
The company wants a solution that can transfer and automatically update data between the on-premises object storage and Amazon S3. The solution must support encryption, scheduling, monitoring, and data integrity validation.
Which solution meets these requirements?
Use the S3 sync command to compare the source S3 bucket and the destination S3 bucket. Determine which source files do not exist in the destination S3 bucket and which source files were modified.
Use AWS Transfer for FTPS to transfer the files from the on-premises storage to Amazon S3.
Use AWS DataSync to make an initial copy of the entire dataset. Schedule subsequent incremental transfers of changing data until the final cutover from on premises to AWS.
Use S3 Batch Operations to pull data periodically from the on-premises storage. Enable S3 Versioning on the S3 bucket to protect against accidental overwrites.
Data Engineering
Machine Learning Implementation and Operations
A company wants to predict the sale prices of houses based on available historical sales data. The target variable in the company s dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants to use multi-variable linear regression to predict house sale prices.
Which step should a machine learning specialist take to remove features that are irrelevant for the analysis and reduce the model s complexity?
Which step should a machine learning specialist take to remove features that are irrelevant for the analysis and reduce the model s complexity?
Plot a histogram of the features and compute their standard deviation. Remove features with high variance.
Plot a histogram of the features and compute their standard deviation. Remove features with low variance.
Build a heatmap showing the correlation of the dataset against itself. Remove features with low mutual correlation scores.
Run a correlation check of all features against the target variable. Remove features with low target variable correlation scores.
Model Development
A company wants to use automatic speech recognition (ASR) to transcribe messages that are less than 60 seconds long from a voicemail-style application. The company requires the correct identification of 200 unique product names, some of which have unique spellings or pronunciations.
The company has 4,000 words of Amazon SageMaker Ground Truth voicemail transcripts it can use to customize the chosen ASR model. The company needs to ensure that everyone can update their customizations multiple times each hour.
Which approach will maximize transcription accuracy during the development phase?
The company has 4,000 words of Amazon SageMaker Ground Truth voicemail transcripts it can use to customize the chosen ASR model. The company needs to ensure that everyone can update their customizations multiple times each hour.
Which approach will maximize transcription accuracy during the development phase?
Use a voice-driven Amazon Lex bot to perform the ASR customization. Create customer slots within the bot that specifically identify each of the required product names. Use the Amazon Lex
synonym mechanism to provide additional variations of each product name as mis-transcriptions are identified in development.
synonym mechanism to provide additional variations of each product name as mis-transcriptions are identified in development.
Use Amazon Transcribe to perform the ASR customization. Analyze the word confidence scores in the transcript, and automatically create or update a custom vocabulary file with any word that has
a confidence score below an acceptable threshold value. Use this updated custom vocabulary file in all future transcription tasks.
a confidence score below an acceptable threshold value. Use this updated custom vocabulary file in all future transcription tasks.
Create a custom vocabulary file containing each product name with phonetic pronunciations, and use it with Amazon Transcribe to perform the ASR customization. Analyze the transcripts and
manually update the custom vocabulary file to include updated or additional entries for those names that are not being correctly identified.
manually update the custom vocabulary file to include updated or additional entries for those names that are not being correctly identified.
Use the audio transcripts to create a training dataset and build an Amazon Transcribe custom language model. Analyze the transcripts and update the training dataset with a manually corrected
version of transcripts where product names are not being transcribed correctly. Create an updated custom language model.
version of transcripts where product names are not being transcribed correctly. Create an updated custom language model.
Exploratory Data Analysis
Model Development
A company will use Amazon SageMaker to train and host a machine learning (ML) model for a marketing campaign. The majority of data is sensitive customer data. The data must be encrypted at rest. The company wants AWS to maintain the root of trust for the master keys and wants encryption key usage to be logged.
Which implementation will meet these requirements?
Which implementation will meet these requirements?
Use encryption keys that are stored in AWS Cloud HSM to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3.
Use SageMaker built-in transient keys to encrypt the ML data volumes. Enable default encryption for new Amazon Elastic Block Store (Amazon EBS) volumes.
Use customer managed keys in AWS Key Management Service (AWS KMS) to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3.
Use AWS Security Token Service (AWS STS) to create temporary tokens to encrypt the ML storage volumes, and to encrypt the model artifacts and data in Amazon S3.
Machine Learning Implementation and Operations
AWS Machine Learning Services
A credit card company wants to build a credit scoring model to help predict whether a new credit card applicant will default on a credit card payment. The company has collected data from a large number of sources with thousands of raw attributes. Early experiments to train a classification model revealed that many attributes are highly correlated, the large number of features slows down the training speed significantly, and that there are some overfitting issues.
The Data Scientist on this project would like to speed up the model training time without losing a lot of information from the original dataset.
Which feature engineering technique should the Data Scientist use to meet the objectives?
The Data Scientist on this project would like to speed up the model training time without losing a lot of information from the original dataset.
Which feature engineering technique should the Data Scientist use to meet the objectives?
Run self-correlation on all features and remove highly correlated features
Normalize all numerical values to be between 0 and 1
Use an autoencoder or principal component analysis (PCA) to replace original features with new features
Cluster raw data using k-means and use sample data from each cluster to build a new dataset
Data Engineering
Model Development
A data engineer at a bank is evaluating a new tabular dataset that includes customer data. The data engineer will use the customer data to create a new model to predict customer behavior. After creating a correlation matrix for the variables, the data engineer notices that many of the 100 features are highly correlated with each other.
Which steps should the data engineer take to address this issue? (Choose two.)
Which steps should the data engineer take to address this issue? (Choose two.)
Use a linear-based algorithm to train the model.
Apply principal component analysis (PCA).
Remove a portion of highly correlated features from the dataset.
Apply min-max feature scaling to the dataset.
Apply one-hot encoding category-based variables.
Data Engineering
Exploratory Data Analysis
A data engineer is using AWS Glue to create optimized, secure datasets in Amazon S3. The data science team wants the ability to access the ETL scripts directly from Amazon SageMaker notebooks within a VPC. After this setup is complete, the data science team wants the ability to run the AWS Glue job and invoke the SageMaker training job.
Which combination of steps should the data engineer take to meet these requirements? (Choose three.)
Which combination of steps should the data engineer take to meet these requirements? (Choose three.)
Create a SageMaker development endpoint in the data science team's VPC.
Create an AWS Glue development endpoint in the data science team's VPC.
Create SageMaker notebooks by using the AWS Glue development endpoint.
Create SageMaker notebooks by using the SageMaker console.
Attach a decryption policy to the SageMaker notebooks.
Create an IAM policy and an IAM role for the SageMaker notebooks.
Data Engineering
Machine Learning Implementation and Operations
Comments