Course Agenda
Multiple Speakers: Heli Helskyaho, Matt Florian, Bruce McCartney, Hannu Jarvi, Richard Strange
Each presents a myriad of details around AI (artificial intelligence) and ML (machine learning). This power-packed micro course is over 6 hours of content.
Machine learning is a type of artificial intelligence that enables computers to learn from data, without being explicitly programmed. It involves the use of algorithms that analyze data, identify patterns and relationships, and make predictions or decisions based on that analysis. We hope that you find this very instructional.
There are different types of machine learning, including unsupervised learning, supervised learning, and reinforcement learning. Unsupervised learning involves identifying patterns in data without the use of labels or predefined categories. Supervised learning involves using labeled data to train a model to make predictions on new, unlabeled data. Reinforcement learning involves training a model through a system of rewards and punishments, with the goal of maximizing the rewards.
Deep learning is a type of machine learning that involves the use of artificial neural networks. These networks are designed to mimic the structure and function of the human brain, allowing them to learn and make predictions from complex data sets.
In the presentation, topics like clustering, regression, and classification will be discussed. Clustering involves grouping similar data points together, while regression involves predicting a numerical value based on a set of input variables. Classification involves assigning data points to pre-defined categories.
Other topics like measuring and improving the model, feature selection, feature transformation, principal component analysis, and hyper-parameter tuning will also be covered. These techniques are used to optimize and improve the accuracy of machine learning models.
Regarding data-driven academic research, Data Vault can be useful in providing a framework for storing and managing large volumes of data. This can help ensure the reproducibility of research and facilitate the sharing of data sets among researchers. However, the implementation of such a framework would require a significant investment of time and resources. Additionally, there may be ethical and privacy considerations that need to be taken into account when sharing data.
Multiple speakers present a business case for applying and utilizing AI / ML (artificial intelligence and machine learning) in the Data Vault landscape. These speakers then share the how-to techniques to dive deep in the methodology, and implementation or application of these concepts. From the business use-cases and reasons for applying the Data Vault, all the way down to hints, suggestions, and processes for applying machine learning and AI to Data Vault constructs.
Heli Helskyaho kicks things off with a keynote:
Machine learning is here and it will stay. Everybody should know what it is about and where it could be used. So what is machine learning and where can it be used?
What is unsupervised learning, supervised learning, or reinforcement learning? What is deep learning? In this presentation we will talk about clustering, regression, classification, and so much more. We will talk about measuring, improving the model, feature selection, feature transformation, principal component analysis, hyper-parameter tuning and much more.
Richard Strange brings value to AI/ML in Research:
Can Data Vault help with data-driven academic research? Academic research is following the trends of the private sector, with the dual problems of growing volume and complexity of data. As reproducibility of data-driven research comes under greater scrutiny, the need to share data sets is more important than ever, but how do we bridge the gap between current practice and the necessary capabilities in research?
Bruce McCartney (Authorized Instructor) discusses the Business Vault:
The Business Vault is increasingly a critical component of the Data Vault 2.0 architecture. The purpose of this presentation is to discuss some of the Artificial Intelligence advances in automating ‘soft’ business rules.
The audience will learn various possible implementation patterns for automating business rules using Artificial Intelligence. This includes an overview of rules engines, machine learning, deep learning and causal inference taxonomies and techniques to augment the raw data into actionable insights. Additional introduction to some modelling terminology and techniques will give audience a high-level of understanding of the use of data science techniques to automate business rule development as well as some alternatives for auditing and debugging AI pipelines.
Heli Helskyaho returns to present:
A successful machine learning needs a team of people with different skills, and you are only one person with the skillset you have. How to get started with machine learning? Do I need to go back to school to learn mathematics and statistics again? Do I need to learn all the machine learning processes, algorithms, hyperparameters and whatever is related to machine learning?
Panic!
Matt Florian presents:
An alternative approach was then adopted to use a metadata-driven development tool to achieve the desired velocity. This required retooling Python developers into being data modelers and IDE developers. Once the team was able to move past the learning curve, the velocity increased, the quality of data increased, and the value to the business was realized. The outcome was a data platform consisting of well-thought-out data and technical architecture to meet the needs of data and knowledge workers.
Heli Helskyaho Presents again:
The biggest problem with machine learning is data. There is not enough data, the data is not correct, etc. Data Vault. More is an excellent source for machine learning since it has good quality data that has been modeled, understood, and checked with the Data Vault methodology processes.
How to do machine learning in a Data Vault database? What tools are available and how would it be best to get started.
Hannu Jarvi Presents AI/ML Pipelines:
Data Vault More is the first Data Warehousing paradigm that seeks to preserve all information, thus it is also the first paradigm that provides a solid foundation for ML/AI.
The Machine Learning pipeline itself creates a lot of new information. In addition to the end result – for example, an evaluation of an X-ray – it provides a lot of intermediate information: Alternative explanations with varying probabilities, different results with different parameter values, etc. – information that may eventually prove valuable when analyzed against actual outcomes.