Back Matter – The Modern Data Warehouse in Azure: Building with Speed and Agility on Microsoft’s Cloud Platform

Index
A
Analytical objects creation
calculated column
hierarchy creation
KPI button creation
measurement
perspectives
roles (RBAC)
Auditing process
control database
copy activities
data volumes
operational process
processing times
requirements
storing high watermarks
Azure Analysis Services (AAS)
analytical objects
SeeAnalytical object creation
calculating engine types
deployment
implementation
PaaS benefit
processing model
SeeProcessing model
project link
extensions menu
firewall rules
required details
search options
steps
tables list
workspace database
security model
semantic abstraction
standard tier
vertiPaq engine
Azure Data Factory (ADF)
alert messages
action group creation
email
HTTP request option
JSON schema
logic app creation
mailing service
parameter value
web activities
alert rule creation
configuration pane
metrics activities
new rule creation
notification
data integration
Azure Data Lake Gen 1 (ADL Gen1)
Azure Data Lake Gen 2 (ADL Gen2)
data lake technologies
directory resource
key principles
manage access dialog
B
Batch ingestion tools
Azure synapse analytics
CETAS statement
data warehousing project
ETL solution
investigate issues
risks/opportunities
tools
troubleshooting
Blob storage/Azure storage
C
Cleaning directory
Azure Data Factory
database
data lake
data storage
Clean layer
Column mapping pattern
SeeDynamic column mapping
Compute Data Warehouse Units (cDWUs)
Cosmos DB architecture
layers
horizontal partitioning
consistency options
data explorer
dataset creation
linked service
preceding diagram
resource partitions
resource units (RUs)
semi-structured JSON format
SQL warehouse data
NOSQL (Not only SQL)
Create External Table As Select (CETAS) statement
D
Databricks job cluster
Data contracts
definition
design/integration considerations
entity diagram
integration
scripting code generation
SQL table implementation
Data factory
dataset
debugging activities
integration runtime
invoke monitor script
linked services
managed service identity (MSI)
monitoring portal
parameters-driven
SeeParameters-driven pipelines
pattern
SeePattern processing
pipelines/activities
security
self-hosted integration runtime
solution structure
SSIS integration runtime
templates
triggers
V2 resource
Data integration projects
Data Lake
attributes
benefits
definition
functional perspective
enterprise implementation
modern enterprise
planning structure
polyglot architectures
research/experimentation capabilities
technologies
WAREHOUSE directories
Data Management Views (DMVs)
Data movement process
auditing process
incorporating resilience
logging
monitoring method
Decoupled processing
cleaning process
data warehouse scenario
layers (loading data)
optional/mandatory files
simplistic resolution process
warehouse table
Data streaming
SeeStream ingestion
Data warehouse
cloud revolution
database backups/lakes
key tools
modern data warehouse
multi-region support
naming convention
on-premises tool
resource group/tagging
security standpoint
terms/definition
Deployment options (SQL database)
elastic pools
features
managed instance
SQL DB/synapse analytics
V-Core tiers
Designing data contracts
consistency
generation process
modification
storing
validation
Dictionary encoding
Dynamic column mapping
E, F, G
Error handling
Event ingestion
Azure Synapse Analytics
decoupled
Seedecoupled processing
event-based ingestion
event processing
listening data
risks/opportunities
single file batches
SQL database
Extract, transform and load (ETL/ELT) patterns
ADF V2
anti-window
ingestion mode
mapping data flow process
solution structure (ADF)
window
H
Hadoop distributed file system (HDFS)
HASH distribution
HDInsight cluster
Hyperscale databases
accelerated disaster recovery
application intent parameter
architecture
cheap storage/flexible resources
features
I, J, K
Ingestion modes
approach
architecture
batch
SeeBatch ingestion tools
data streaming
event ingestion
lambda architecture approach
layers
Integration (data contract)
code generation
Azure SQL database
key parameters
ObtainEntityMetadata stored procedure
PowerShell script
process of
SQL database
templates
entity metadata
fetching metadata
harmonizing schema evolution
JSON source code
orchestration metadata
requirements
utilizing orchestration metadata
Integration engine
activities
bucketed up
configuration properties
external compute
looping/conditional logic
output constraints
web activities
ADF
data factory
SeeData factory
Integration runtime (IR)
Internal activities
Invocation methods
Iteration/conditional activities
Iterative parent-child pattern
L
Lambda architecture approach
blending streams/batches
cohesive/contextualized view
definition
serving layer
Linear pattern
Linked service connection
access policies
author/monitor button
connection
data lake storage Gen2 option
key vault secret
resource
security
UI/points
Linked services
Logging process
aggregating data
alerting metadata
definition
events
JSON data storage
approaches
parent-child processes
pipelines
platform track
processing hierarchy
structures
table code
table recreation
tabular data
extended capabilities
requirements
storage
M, N
Machine learning resource
Managed service identity (MSI)
Mapping data flows
advantages
categories
data types
ETL steps
inputs/outputs
manipulation
mapping tab
pipeline
projection tab
row modification
schema modification
sink source
source options tab
transformation step
trim function
Massively parallel processing (MPP)
MERGE statement
Metadata
SeeIntegration (data contract)
Monitoring method
O
Online analytical processing (OLAP) systems
Online transactional processing (OLTP) systems
P, Q
Parallel execution
Parameters-driven pipelines
configuration
control database
definition
invocation approach
lookup activities
mapping data flows
steps
stored procedure
Parent-child pattern
Parquet/Optimized Row Columnar (ORC)
Pattern processing
boxed activities
column mapping
definition
iterative parent-child pattern
linear pattern
parent-child pattern
partitioning option
Pipelines
configuration
copy data activity
debugging activities
ellipsis menu
input parameters
mapping data flows
monitoring portal
sink dataset
source dataset properties
PolyBase technology
components
credential creation
CTAS syntax
external data source
external table
file format
value/percentage
Polyglot architectures
characteristics
data cleaning/preparation
lake processing
SQL preference
Synapse Analytics/Azure Data Lake Gen 2
Power BI (Microsoft Power BI)
data visualization
key components
reports
columns, measures/hierarchies
connect menu
data warehouse
navigation panel
output window
relationships
server information
splash screen
tables details
service process
working process
PowerShell scripts
Processing model
authorization process
data factory pipeline
options
process request
service principal creation
SPN details
web activities
R
Raw directory
data lake implementation
data storage
file formats
key benefit
partitioning
sink dataset directory
Raw layer
Recovery point objective (RPO)
Recovery time objective (RTO)
Recurse data lake structures
Replicated distribution
Resilience
alert data factory rules
SeeAzure Data Factory (ADF)
data factory
defensive checks
troubleshooting (metadata)
Resource management
classes
data factory pipeline
dynamic classes
pause/resume warehouse
service objective
static classes
ROUND ROBIN distribution
Run length encoding (RLE)
S
Scripting language
code generation
approach
data contracts
elements
ForEach loops
output folder
PowerShell code
tables/procs details
invoke/monitor
PowerShell
recurse structures
Security configuration (Data Lake)
ADL Gen2
default permissions
key information
parent folders
permission setup
Self-Hosted Integration Runtime (SHIR)
Semantic layer
Source controlled option (data factory)
SQL storage engine
database (SQL DB)
adaptive join
adaptive query processing
artificial intelligence
automatic tuning
batch mode memory grant feedback
benefits
cloud-based OLTP engine
concurrency, 20deployment options
hyperscale
interleaved execution
trickle-fed data warehouses
four Vs (volume, variety, value/velocity)
synapse analytics
SeeSynapse analytics
SSIS Integration Runtime (IR)
Static resource class
Stored procedure
Store linked services
Stream ingestion
benefit of
event-based/batch-based processing
implementation
analytics jobs
Azure Event Hubs
blob storage
SQL database
risks/opportunities
Symmetric multi-processing (SMP)
Synapse analytics
batch ingestion
CTAS pattern
DDL statement
external table
file structure
PolyBase engine
warehouse fact table
distributions
columns
compute nodes
HASH distribution
MPP vs. SMP
REPLICATED distribution
right column
ROUND ROBIN approach
SMP single storage point
storage nodes
event ingestion
PolyBase
resources
SeeResource management
SQL database
workload management/importance
T, U
Transformed directory
data storage
ELT approach
ingestion architecture
key points
warehouse
Triggers
V
Value encoding
V-Core tiers
VertiPaq engine
W, X, Y, Z
Windows Azure Storage Blob (WASB)