Ignite Your Insights


Course Overview
Multiple speakers present a business case for applying and utilizing AI / ML (artificial intelligence and machine learning) in the Data Vault landscape. These speakers then share the how-to techniques to dive deep in the methodology, and implementation or application of these concepts. From the business use-cases and reasons for…
Full Course Description
Multiple speakers present a business case for applying and utilizing AI / ML (artificial intelligence and machine learning) in the Data Vault landscape. These speakers then share the how-to techniques to dive deep in the methodology, and implementation or application of these concepts. From the business use-cases and reasons for applying the Data Vault, all the way down to hints, suggestions, and processes for applying machine learning and AI to Data Vault constructs.
Heli Helskyaho kicks things off with a keynote:
Machine learning is here and it will stay. Everybody should know what it is about and where it could be used. So what is machine learning and where can it be used? What is unsupervised learning, supervised learning, or reinforcement learning? What is deep learning? In this presentation we will talk about clustering, regression, classification, and so much more. We will talk about measuring, improving the model, feature selection, feature transformation, principal component analysis, hyper-parameter tuning and much more.Richard Strange brings value to AI/ML in Research:
Can Data Vault help with data-driven academic research? Academic research is following the trends of the private sector, with the dual problems of growing volume and complexity of data. As reproducibility of data-driven research comes under greater scrutiny, the need to share data sets is more important than ever, but how do we bridge the gap between current practice and the necessary capabilities in research?Bruce McCartney (Authorized Instructor) discusses the Business Vault:
The Business Vault is increasingly a critical component of the Data Vault 2.0 architecture. The purpose of this presentation is to discuss some of the Artificial Intelligence advances in automating ‘soft’ business rules. The audience will learn various possible implementation patterns for automating business rules using Artificial Intelligence. This includes an overview of rules engines, machine learning, deep learning and causal inference taxonomies and techniques to augment the raw data into actionable insights. Additional introduction to some modelling terminology and techniques will give audience a high-level of understanding of the use of data science techniques to automate business rule development as well as some alternatives for auditing and debugging AI pipelines.Heli Helskyaho returns to present:
A successful machine learning needs a team of people with different skills, and you are only one person with the skillset you have. How to get started with machine learning? Do I need to go back to school to learn mathematics and statistics again? Do I need to learn all the machine learning processes, algorithms, hyperparameters and whatever is related to machine learning? Panic!Matt Florian presents:
An alternative approach was then adopted to use a metadata-driven development tool to achieve the desired velocity. This required retooling Python developers into being data modelers and IDE developers. Once the team was able to move past the learning curve, the velocity increased, the quality of data increased, and the value to the business was realized. The outcome was a data platform consisting of well-thought-out data and technical architecture to meet the needs of data and knowledge workers.Heli Helskyaho Presents again:
The biggest problem with machine learning is data. There is not enough data, the data is not correct, etc. Data Vault. More is an excellent source for machine learning since it has good quality data that has been modeled, understood, and checked with the Data Vault methodology processes. How to do machine learning in a Data Vault database? What tools are available and how would it be best to get started.Hannu Jarvi Presents AI/ML Pipelines:
Data Vault More is the first Data Warehousing paradigm that seeks to preserve all information, thus it is also the first paradigm that provides a solid foundation for ML/AI. The Machine Learning pipeline itself creates a lot of new information. In addition to the end result – for example, an evaluation of an X-ray – it provides a lot of intermediate information: Alternative explanations with varying probabilities, different results with different parameter values, etc. – information that may eventually prove valuable when analyzed against actual outcomes.Learning Maximized!
Achieve expertise quickly
High quality content, self-paced video for your maximized learning.
Extensive training with focused topics leading to your success.
Join other students currently engaged in your learning journey.
Course Lessons
Section Header
Speaker: Heli Helskyaho
Machine learning is here and it will stay. Everybody should know what it is about and where it could be used. So what is machine learning and where can it be used?
What is unsupervised learning, supervised learning, or reinforcement learning? What is deep learning? In this presentation we will talk about clustering, regression, classification, and so much more. We will talk about measuring, improving the model, feature selection, feature transformation, principal component analysis, hyper-parameter tuning and much more.
At the end of the session we discuss what else there is to learn and how to get started with machine learning.
In this presentation you learn what machine learning is all about and hopefully get so excited about the whole thing you want to learn more! After this presentation it will be easier to start learning. This presentation is an improved version of a presentation awarded as the best presentation at KScope19 conference on Emerging technologies track.
Please note: Heli is a Certified Data Vault 2.0 Practitioner, and has been working with Data Vault for over 5 years. This presentation is NOT to be missed! She will tie machine learning to Data Vault across both of her presentations.
Speaker: Richard Strange
Can Data Vault help with data-driven academic research? Academic research is following the trends of the private sector, with the dual problems of growing volume and complexity of data. As reproducibility of data-driven research comes under greater scrutiny, the need to share data sets is more important than ever, but how do we bridge the gap between current practice and the necessary capabilities in research?
Topics Covered:
- Current state and challenges of data-driven research
- Case studies
- Model approaches for applying Data Vault methods
Speaker: Bruce McCartney
The Business Vault is increasingly a critical component of the Data Vault 2.0 architecture. The purpose of this presentation is to discuss some of the Artificial Intelligence advances in automating 'soft' business rules.
The audience will learn various possible implementation patterns for automating business rules using Artificial Intelligence. This includes an overview of rules engines, machine learning, deep learning and causal inference taxonomies and techniques to augment the raw data into actionable insights. Additional introduction to some modelling terminology and techniques will give audience a high-level of understanding of the use of data science techniques to automate business rule development as well as some alternatives for auditing and debugging AI pipelines.
The presentation will outline various implementation patterns for using Artificial Intelligence in your information pipelines, including exposure to software tools fro development and management of these new paradigms. Finally a practical discussion of AI for augmenting data catalogs (identifying business keys) and how AI is used to forecast weather.
Topics Covered:
- Business Vault Overview
- Rules Engine Approach to Business Rules
- Machine and deep learning Overview
- Causal Inference
Speaker: Matt Florian
Title: Automated Machine Learning and Data Vault
With Python becoming ubiquitous with data engineering, there is an increasing trend to return to hand-coding the building of the data vault. This session will present a tale of one project that did both Python data engineering and metadata code engine development. It will highlight the strengths and weaknesses of each approach using real-world examples of each. We will review the strength of the data sourcing, modeling, profiling, and consumption. It will conclude with a breakdown of which approach won the day for the project and the benefits achieved.
In the initial iteration of the project, a target logical data model was created to integrate two ERP systems into a single data platform. The goal was to enable historical data analytics to support a rolling deployment of SAP across the enterprise. The decision was made at the outset to use Python with AWS Glue to consume data from S3 raw lake and land it into a Snowflake database. With a team of data engineers, the initial data was landed easily. However, the velocity of the team was limited and the scope of what could be accomplished in the target timeline was continually scaled back. This meant the team could not release the data value at the speed needed by the business.
An alternative approach was then adopted to use a metadata-driven development tool to achieve the desired velocity. This required retooling Python developers into being data modelers and IDE developers. Once the team was able to move past the learning curve, the velocity increased, the quality of data increased, and the value to the business was realized. The outcome was a data platform consisting of well-thought-out data and technical architecture to meet the needs of data and knowledge workers.
Major Topics Covered
• Data Vault 2.0 on Snowflake
• Python in AWS Glue
• Wherescape 3D & Red
• Metadata Code Generator
Agenda
• Establish Client Scope for ERP Data Integration
• Hand Coded Development Architecture
• Project Achievements and Outcomes
• Python Strengths and Weakness Retrospective
• Metadata Code Generator Architecture
• Project Achievements and Outcomes
• Code Generator Strengths and Weakness Retrospective
• Approach Comparison and Learnings
Section Header
Speaker: Heli Helskyaho
A successful machine learning needs a team of people with different skills, and you are only one person with the skillset you have. How to get started with machine learning? Do I need to go back to school to learn mathematics and statistics again? Do I need to learn all the machine learning processes, algorithms, hyperparameters and whatever is related to machine learning?
Panic!
But luckily there is no need for a panic: AutoML is for you. Automated machine learning (AutoML) is a shortcut to machine learning. It automates many of the steps on machine learning process, and lets you concentrate on your expertise: data and Data Vault. In this presentation we talk about AutoML and see some demos on using automated machine learning on Data Vault data.
Major Topics Covered
- AutoML
- Machine Learning
Agenda
- What is AutoML
- Why and when AutoML
- How to use AutoML
Speaker: Heli Helskyaho
The biggest problem with machine learning is data. There is not enough data, the data is not correct, etc. Data Vault is an excellent source for machine learning since it has good quality data that has been modeled, understood, and checked with the Data Vault methodology processes.
How to do machine learning in a Data Vault database? What tools are available and how would it be best to get started.
In this presentation we will talk about these questions and see some examples of machine learning in real life.
Topics Covered:
- why Data Vault is a great source for Machine Learning
- how to use your Data Vault data for Machine Learning
Speakers: Heli Helskyaho and Mattias Helskyaho
You are a business expert and a real guru with data, but why does machine learning seem so difficult?
Because successful machine learning needs a team of people with different skills—and you're only one person.
How do you get started with machine learning?
Do you need to go back to school to relearn mathematics and statistics?
Do you need to study all the processes, algorithms, hyperparameters, and whatever is related to machine learning?
Luckily there is no cause for panic. In this presentation we discuss two easy ways of starting with machine learning: AutoML and AI services.
Presentation Topics: AutoML, AI Services, Machine Learning, analytics
Speaker: Hannu Jarvi
Machine Learning / Artificial Intelligence is largely about recognizing patterns in data.
You probably have seen, on TV news reports, stories about crime and noticed how the accused perpetrator’s or victim’s face has been blurred or pixelated.
The resolution has been lowered to hide the patterns that make that person recognizable. Something very similar happens when the granularity of data changes in traditional Data Warehousing. The data is “pixelated”. When Machine Learning is no longer able to recognize patterns in the data as a result “pixelated” data; that Information is lost forever.
Data Vault is the first Data Warehousing paradigm that seeks to preserve all information, thus it is also the first paradigm that provides a solid foundation for ML/AI.
The Machine Learning pipeline itself creates a lot of new information. In addition to the end result – for example, an evaluation of an X-ray - it provides a lot of intermediate information: Alternative explanations with varying probabilities, different results with different parameter values, etc. - information that may eventually prove valuable when analyzed against actual outcomes.
The volume and complexity of this kind of information may grow exponentially. A Data Lake, the data management workhorse of ML/AI community, cannot deal with this complexity. The information is “Lost in the Exhaust” of the ML pipeline.
This presentation explores ideas for capturing ALL value created in ML/AI pipelines. The ideas are based on challenges faced by some leading ML scientists in Healthcare and Pharma, and present the opportunities provided by Data Vault 2.0 for solving these challenges.
Topics Covered:
- Why AI / ML needs pristine data in its original grain
- Why DV is an ideal foundation for AI / ML
- Why Dimensional Models don't work for AI / ML
Why Choose Membership
benefit 1
Access a wealth of collective knowledge
benefit 2
Foster cross-industry perspectives
benefit 3
Adapt your strategies to evolving industry dynamics
benefit 4
Seek guidance and validation for your ideas
Experience the power of collaborative problem-solving as you engage with fellow professionals, guided by seasoned experts in the field. Join me and thousands of others around the world to enrich your experience.
Dan Linstedt (Data Vault Inventor)

Get Your Membership
Elevate your potential, expand your horizons, and become a driving force in the ever-evolving landscape of data management with our premium Professional Membership.
