Microsoft (DP-100) Exam Questions And Answers page 18
You have recently concluded the construction of a binary classification machine learning model.
You are currently assessing the model. You want to make use of a visualization that allows for precision to be used as the measurement for the assessment.
Which of the following actions should you take?
You are currently assessing the model. You want to make use of a visualization that allows for precision to be used as the measurement for the assessment.
Which of the following actions should you take?
You should consider using Receiver Operating Characteristic (ROC) curve visualization.
You should consider using Box plot visualization.
You should consider using the Binary classification confusion matrix visualization.
Data Preparation and Processing
Modeling
You are preparing to build a deep learning convolutional neural network model for image classification. You create a script to train the model using CUDA devices.
You must submit an experiment that runs this script in the Azure Machine Learning workspace.
The following compute resources are available:
• a Microsoft Surface device on which Microsoft Office has been installed. Corporate IT policies prevent the installation of additional software
• a Compute Instance named ds-workstation in the workspace with 2 CPUs and 8 GB of memory
• an Azure Machine Learning compute target named cpu-cluster with eight CPU-based nodes
• an Azure Machine Learning compute target named gpu-cluster with four CPU and GPU-based nodes
You need to specify the compute resources to be used for running the code to submit the experiment, and for running the script in order to minimize model training time.
Which resources should the data scientist use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
You must submit an experiment that runs this script in the Azure Machine Learning workspace.
The following compute resources are available:
• a Microsoft Surface device on which Microsoft Office has been installed. Corporate IT policies prevent the installation of additional software
• a Compute Instance named ds-workstation in the workspace with 2 CPUs and 8 GB of memory
• an Azure Machine Learning compute target named cpu-cluster with eight CPU-based nodes
• an Azure Machine Learning compute target named gpu-cluster with four CPU and GPU-based nodes
You need to specify the compute resources to be used for running the code to submit the experiment, and for running the script in order to minimize model training time.
Which resources should the data scientist use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Data Preparation and Processing
Modeling
How to run the defined steps in a pipeline using code segments?
Multiple Choice
You use the following code to define the steps for a pipeline:
from azureml.core import Workspace, Experiment, Run
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import PythonScriptStep
ws = Workspace.from_config()
. . .
step1 = PythonScriptStep(name="step1", ...)
step2 = PythonScriptsStep(name="step2", ...)
pipeline_steps = [step1, step2]
You need to add code to run the steps.
Which two code segments can you use to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
from azureml.core import Workspace, Experiment, Run
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import PythonScriptStep
ws = Workspace.from_config()
. . .
step1 = PythonScriptStep(name="step1", ...)
step2 = PythonScriptsStep(name="step2", ...)
pipeline_steps = [step1, step2]
You need to add code to run the steps.
Which two code segments can you use to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
experiment = Experiment(workspace=ws,
name='pipeline-experiment')
run = experiment.submit(config=pipeline_steps)
name='pipeline-experiment')
run = experiment.submit(config=pipeline_steps)
run = Run(pipeline_steps)
pipeline = Pipeline(workspace=ws, steps=pipeline_steps)
experiment = Experiment(workspace=ws,
name='pipeline-experiment')
run = experiment.submit(pipeline)
experiment = Experiment(workspace=ws,
name='pipeline-experiment')
run = experiment.submit(pipeline)
pipeline = Pipeline(workspace=ws, steps=pipeline_steps)
run = pipeline.submit(experiment_name='pipeline-experiment')
run = pipeline.submit(experiment_name='pipeline-experiment')
Data Preparation and Processing
Deployment and Monitoring
You are creating an experiment by using Azure Machine Learning Studio.
You must divide the data into four subsets for evaluation. There is a high degree of missing values in the data. You must prepare the data for analysis.
You need to select appropriate methods for producing the experiment.
Which three modules should you run in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.
You must divide the data into four subsets for evaluation. There is a high degree of missing values in the data. You must prepare the data for analysis.
You need to select appropriate methods for producing the experiment.
Which three modules should you run in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.
Data Preparation and Processing
Modeling
You are using the Azure Machine Learning Service to automate hyperparameter exploration of your neural network classification model.
You must define the hyperparameter space to automatically tune hyperparameters using random sampling according to following requirements:
• The learning rate must be selected from a normal distribution with a mean value of 10 and a standard deviation of 3.
• Batch size must be 16, 32 and 64.
• Keep probability must be a value selected from a uniform distribution between the range of 0.05 and 0.1.
You need to use the param_sampling method of the Python API for the Azure Machine Learning Service.
How should you complete the code segment? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
You must define the hyperparameter space to automatically tune hyperparameters using random sampling according to following requirements:
• The learning rate must be selected from a normal distribution with a mean value of 10 and a standard deviation of 3.
• Batch size must be 16, 32 and 64.
• Keep probability must be a value selected from a uniform distribution between the range of 0.05 and 0.1.
You need to use the param_sampling method of the Python API for the Azure Machine Learning Service.
How should you complete the code segment? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Data Preparation and Processing
Modeling
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You plan to use a Python script to run an Azure Machine Learning experiment. The script creates a reference to the experiment run context, loads data from a file, identifies the set of unique values for the label column, and completes the experiment run:
from azureml.core import Run
import pandas as pd
run = Run.get_context()
data = pd.read_csv('data.csv')
label_vals = data['label'].unique()
# Add code to record metrics here
run.complete()
The experiment must record the unique labels in the data as metrics for the run that can be reviewed later.
You must add code to the script to record the unique label values as run metrics at the point indicated by the comment.
Solution: Replace the comment with the following code:
run.upload_file('outputs/labels.csv', './data.csv')
Does the solution meet the goal?
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You plan to use a Python script to run an Azure Machine Learning experiment. The script creates a reference to the experiment run context, loads data from a file, identifies the set of unique values for the label column, and completes the experiment run:
from azureml.core import Run
import pandas as pd
run = Run.get_context()
data = pd.read_csv('data.csv')
label_vals = data['label'].unique()
# Add code to record metrics here
run.complete()
The experiment must record the unique labels in the data as metrics for the run that can be reviewed later.
You must add code to the script to record the unique label values as run metrics at the point indicated by the comment.
Solution: Replace the comment with the following code:
run.upload_file('outputs/labels.csv', './data.csv')
Does the solution meet the goal?
Yes
No
Data Preparation and Processing
Deployment and Monitoring
You develop and train a machine learning model to predict fraudulent transactions for a hotel booking website.
Traffic to the site varies considerably. The site experiences heavy traffic on Monday and Friday and much lower traffic on other days. Holidays are also high web traffic days.
You need to deploy the model as an Azure Machine Learning real-time web service endpoint on compute that can dynamically scale up and down to support demand.
Which deployment compute option should you use?
Traffic to the site varies considerably. The site experiences heavy traffic on Monday and Friday and much lower traffic on other days. Holidays are also high web traffic days.
You need to deploy the model as an Azure Machine Learning real-time web service endpoint on compute that can dynamically scale up and down to support demand.
Which deployment compute option should you use?
attached Azure Databricks cluster
Azure Container Instance (ACI)
Azure Kubernetes Service (AKS) inference cluster
Azure Machine Learning Compute Instance
attached virtual machine in a different region
Modeling
Deployment and Monitoring
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are using Azure Machine Learning to run an experiment that trains a classification model.
You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You configure a HyperDriveConfig for the experiment by running the following code:
You plan to use this configuration to run a script that trains a random forest model and then tests it with validation data. The label values for the validation data are stored in a variable named y_test variable, and the predicted probabilities from the model are stored in a variable named y_predicted.
You need to add logging to the script to allow Hyperdrive to optimize hyperparameters for the AUC metric.
Solution: Run the following code:
Does the solution meet the goal?
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are using Azure Machine Learning to run an experiment that trains a classification model.
You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You configure a HyperDriveConfig for the experiment by running the following code:
You plan to use this configuration to run a script that trains a random forest model and then tests it with validation data. The label values for the validation data are stored in a variable named y_test variable, and the predicted probabilities from the model are stored in a variable named y_predicted.
You need to add logging to the script to allow Hyperdrive to optimize hyperparameters for the AUC metric.
Solution: Run the following code:
Does the solution meet the goal?
Yes
No
Data Preparation and Processing
Modeling
You use the Azure Machine Learning Python SDK to define a pipeline to train a model.
The data used to train the model is read from a folder in a datastore.
You need to ensure the pipeline runs automatically whenever the data in the folder changes.
What should you do?
The data used to train the model is read from a folder in a datastore.
You need to ensure the pipeline runs automatically whenever the data in the folder changes.
What should you do?
Set the regenerate_outputs property of the pipeline to True
Create a ScheduleRecurrance object with a Frequency of auto. Use the object to create a Schedule for the pipeline
Create a PipelineParameter with a default value that references the location where the training data is stored
Create a Schedule for the pipeline. Specify the datastore in the datastore property, and the folder containing the training data in the path_on_datastore property
Designing and Implementing Data Science Solutions
Data Preparation and Processing
You run a script as an experiment in Azure Machine Learning.
You have a Run object named run that references the experiment run. You must review the log files that were generated during the experiment run.
You need to download the log files to a local folder for review.
Which two code segments can you run to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
You have a Run object named run that references the experiment run. You must review the log files that were generated during the experiment run.
You need to download the log files to a local folder for review.
Which two code segments can you run to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
run.get_details()
run.get_file_names()
run.get_metrics()
run.download_files(output_directory='./runfiles')
run.get_all_logs(destination='./runlogs')
Data Preparation and Processing
Deployment and Monitoring
Comments