Databricks Certified Data Engineer Associate Exam Practice Questions (P. 2)
- Full Access (173 questions)
- Six months of Premium Access
- Access to one million comments
- Seamless ChatGPT Integration
- Ability to download PDF files
- Anki Flashcard files for revision
- No Captcha & No AdSense
- Advanced Exam Configuration
Question #6
Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?
- ASELECT * FROM my_table WHERE age > 25;
- BUPDATE my_table WHERE age > 25;
- CDELETE FROM my_table WHERE age > 25;Most Voted
- DUPDATE my_table WHERE age <= 25;
- EDELETE FROM my_table WHERE age <= 25;
Correct Answer:
C
C

To remove rows where the age value exceeds 25 from 'my_table', the DELETE FROM statement is essential. Unlike the SELECT or UPDATE commands, DELETE FROM directly modifies the table by removing entries that meet the specified condition—in this case, 'age > 25'. This process ensures that all records with ages above 25 are effectively and permanently removed from the table.
send
light_mode
delete
Question #7
A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.
Which of the following explains why the data files are no longer present?
Which of the following explains why the data files are no longer present?
- AThe VACUUM command was run on the tableMost Voted
- BThe TIME TRAVEL command was run on the table
- CThe DELETE HISTORY command was run on the table
- DThe OPTIMIZE command was nun on the table
- EThe HISTORY command was run on the table
Correct Answer:
C
C

It appears the VACUUM command is the actual cause for the inability to time travel to an older version of the Delta table. The VACUUM command in Delta tables cleans up obsolete data files which aren't necessary for current transactions but are crucial for time travel applications. Running this command, particularly with a retention policy that clears files older than your desired rollback period, leads to the impossibility of accessing those versions — in this case, the 3-day old data.
send
light_mode
delete
Question #8
Which of the following Git operations must be performed outside of Databricks Repos?
- ACommit
- BPull
- CPush
- DClone
- EMergeMost Voted
Correct Answer:
D
D

The correct operation that must be performed outside of Databricks Repos is "Clone." In Databricks Repos, certain Git operations such as commit, pull, push, and merge can be handled directly. However, for operations such as cloning a repository, you need to work directly with your Git provider. This ensures that you start with a complete and standalone copy of the repository, a crucial step particularly when setting up a new project environment effectively.
send
light_mode
delete
Question #9
Which of the following data lakehouse features results in improved data quality over a traditional data lake?
- AA data lakehouse provides storage solutions for structured and unstructured data.
- BA data lakehouse supports ACID-compliant transactions.Most Voted
- CA data lakehouse allows the use of SQL queries to examine data.
- DA data lakehouse stores data in open formats.
- EA data lakehouse enables machine learning and artificial Intelligence workloads.
Correct Answer:
C
C

The correct answer indeed revolves around the ACID-compliant transactions supported by data lakehouses, a feature that massively boosts data quality over traditional data lakes. ACID transactions guarantee that all data operations are processed in a consistent, atomic, isolated, and durable manner, markedly reducing the chances of data inconsistencies and corruption. This foundational consistency ensures that even with concurrent data manipulations, the integrity and reliability of the data are maintained, a capability often absent in traditional data lakes. This feature directly translates to maintained and enhanced data quality in a data lakehouse environment.
send
light_mode
delete
Question #10
A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.
Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?
Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?
- ADatabricks Repos automatically saves development progress
- BDatabricks Repos supports the use of multiple branchesMost Voted
- CDatabricks Repos allows users to revert to previous versions of a notebook
- DDatabricks Repos provides the ability to comment on specific changes
- EDatabricks Repos is wholly housed within the Databricks Lakehouse Platform
Correct Answer:
B
B

The advantage of using Databricks Repos over Databricks Notebooks versioning stems primarily from its support for multiple branches. This capability is essential for parallel development and collaboration, enabling teams to work on different features or bug fixes simultaneously without interfering with the main codebase. This branching feature, integral to robust version control systems like Git, offers a structured and collaborative environment that simplifies merging changes and managing diverse development activities. Overall, this makes Databricks Repos a more flexible and teamwork-friendly option than the built-in versioning in Databricks Notebooks.
send
light_mode
delete
All Pages