DP-100 practice qurstions
You are creating a new Azure Machine Learning pipeline using the designer. The pipeline must train a model using data in a comma-separated values (CSV) file that is published on a website. You have not created a dataset for this file. You need to ingest the data from the CSV file into the designer pipeline using the minimal administrative effort. Which module should you add to the pipeline in Designer?
A. Convert to CSV
B. Enter Data Manually
C. Import Data
D. Dataset
Discussion forum
Question
You use the Azure Machine Learning service to create a tabular dataset named training_data. You plan to use this dataset in a training script. You create a variable that references the dataset using the following code: training_ds = workspace.datasets.get(“training_data”) You define an estimator to run the script. You need to set the correct property of the estimator to ensure that your script can access the training_data dataset. Which property should you set?
A. environment_definition = {“training_data”:training_ds}
B. inputs = [training_ds.as_named_input(‘training_ds’)]
C. script_params = {“–training_ds”:training_ds}
D. source_directory = training_ds
Discussion forum
Question
You are solving a classification task. You must evaluate your model on a limited data sample by using k-fold cross-validation. You start by configuring a k parameter as the number of splits. You need to configure the k parameter for the cross-validation. Which value should you use?
A. k=1
B. k=10
C. k=0.5
D. k=0.9
Discussion forum
Question
You are evaluating a completed binary classification machine learning model. You need to use the precision as the evaluation metric. Which visualization should you use?
A. violin plot
B. Gradient descent
C. Scatter plot
D. Receiver Operating Characteristic (ROC) curve
Discussion forum
Question
You are performing feature engineering on a dataset. You must add a feature named CityName and populate the column value with the text London. You need to add the new feature to the dataset. Which Azure Machine Learning Studio module should you use?
A. Edit Metadata
B. Filter Based Feature Selection
C. Execute Python Script
D. Latent Dirichlet Allocation
Discussion forum
Question
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a new experiment in Azure Machine Learning Studio. One class has a much smaller number of observations than the other classes in the training set. You need to select an appropriate data sampling strategy to compensate for the class imbalance. Solution: You use the Principal Components Analysis (PCA) sampling mode. Does the solution meet the goal?
A. Yes
B. No
Discussion forum
Question
You are building a regression model for estimating the number of calls during an event. You need to determine whether the feature values achieve the conditions to build a Poisson regression model. Which two conditions must the feature set contain? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
A. The label data must be a negative value.
B. The label data must be whole numbers.
C. The label data must be non-discrete.
D. The label data must be a positive value.
E. The label data can be positive or negative.
Discussion forum
Question
You plan to use automated machine learning to train a regression model. You have data that has features which have missing values, and categorical features with few distinct values. You need to configure automated machine learning to automatically impute missing values and encode categorical features as part of the training task. Which parameter and value pair should you use in the AutoMLConfig class?
A. featurization = ‘auto’
B. enable_voting_ensemble = True
C. task = ‘classification’
D. exclude_nan_labels = True
E. enable_tf = True
Discussion forum
Question
You are performing a filter-based feature selection for a dataset to build a multi-class classifier by using Azure Machine Learning Studio. The dataset contains categorical features that are highly correlated to the output label column. You need to select the appropriate feature scoring statistical method to identify the key predictors. Which method should you use?
A. Kendall correlation
B. Spearman correlation
C. Chi-squared
D. Pearson correlation
Discussion forum
Question
You create a multi-class image classification deep learning model that uses the PyTorch deep learning framework. You must configure Azure Machine Learning Hyperdrive to optimize the hyperparameters for the classification model. You need to define a primary metric to determine the hyperparameter values that result in the model with the best accuracy score. Which three actions must you perform? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
A. Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to maximize.
B. Add code to the bird_classifier_train.py script to calculate the validation loss of the model and log it as a float value with the key loss.
C. Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to minimize.
D. Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to accuracy.
E. Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to loss.
F. Add code to the bird_classifier_train.py script to calculate the validation accuracy of the model and log it as a float value with the key accuracy.
Discussion forum
Question
You are evaluating a completed binary classification machine learning model. You need to use the precision as the evaluation metric. Which visualization should you use?
A. Violin plot
B. Gradient descent
C. Box plot
D. Binary classification confusion matrix
Discussion forum
Question
You are creating a machine learning model. You need to identify outliers in the data. Which two visualizations can you use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
A. Venn diagram
B. Box plot
C. ROC curve
D. Random forest diagram
E. Scatter plot
Discussion forum
Question
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a new experiment in Azure Machine Learning Studio. One class has a much smaller number of observations than the other classes in the training set. You need to select an appropriate data sampling strategy to compensate for the class imbalance. Solution: You use the Stratified split for the sampling mode. Does the solution meet the goal?
A. Yes
B. No
Discussion foruVVm
Question
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a new experiment in Azure Machine Learning Studio. One class has a much smaller number of observations than the other classes in the training set. You need to select an appropriate data sampling strategy to compensate for the class imbalance. Solution: You use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode. Does the solution meet the goal?
A. Yes
B. No
Discussion forum
Question
You are analyzing a dataset containing historical data from a local taxi company. You are developing a regression model. You must predict the fare of a taxi trip. You need to select performance metrics to correctly evaluate the regression model. Which two metrics can you use? Each correct answer presents a complete solution? NOTE: Each correct selection is worth one point.
A. a Root Mean Square Error value that is low
B. an R-Squared value close to 0
C. an F1 score that is low
D. an R-Squared value close to 1
E. an F1 score that is high
F. a Root Mean Square Error value that is high
Discussion forum
Question