Differentiate between AI to ML to DL; Discuss Analytics to AI evolution. What is Big Data and its relevance to AI; Discuss why AI gets so much attention and compare ML with DL.
1.What is the relationship between Machine Learning (ML) and Artificial Intelligence (AI)?
All AI is performed using ML algorithms only
ML is a subcategory of AI
No connection
2.What size of data do you need for AI?
Big data
Image data
Proportional to algorithm and problem complexity
3.Why is Deep Learning important?
It solves some problems, mainly sensory ones in quality that outperforms humans’ ability
All problems are suitable for and best solved by DL
It uses massive amounts of data and highly interpretable
How ML algorithms work | Assessment
How ML algorithms work assessment.
1.All ML algorithms work in the following order:
Training -> inference -> data collection
Data collection-> inference -> training
Data collection -> training -> inference
2.What is the goal of the training phase of an ML algorithm?
To create a robust and general enough model of the problem it solves so it will be good with unseen data
To memorize all training data examples so it will be good with data it has already seen
To select which algorithm will use the most advanced HW and algorithms combination
3.If our algorithm doesn’t perform well in the inference step, what should we do?
Wait until it gets better autonomously
Go back to improve the training phase and if that doesn’t work go back to improve the data we use
This means we will need to collect big data since otherwise nothing will work
4.Why is quality data collection important for ML?
Its overrated. algorithms will overcome any data issue
With ML algorithms: It’s “garbage in garbage out”. The algorithm is limited by the quality of data we feed it.
When we perform the inference step we teach the algorithm to be good with unseen data
Different types of AI tasks with examples | Assessment
Different types of AI tasks with examples assessment.
1.Classification algorithms:
Are a type of supervised learning algorithms designed to identify to which category each data instance belongs to.
Require no labels and are used for online learning
The only type of ML we use at company
2.Reinforcement Learning algorithms:
Reinforce human conceptions about the data labels
Mimic how humans learn new tasks through a smart trial and error
Can be used for supervised and unsupervised regression
3.Regression is mainly used for:
To identify anomalies in images from the fab
To process text in company's customers tickets
Predicting the value of a continuous variable based on its input variables
4.Is often required since we don't have labeled data
Can be done by junior employees since they don’t need supervision
Is not good for solving vision problems since it uses only complex algorithms
Mapping between business challenges and AI challenges_Assessment
Mapping between business challenges and AI challenges_Assessment
1.Which of the following AI tasks was NOT mentioned in the video?
Classification
Self-supervised learning
Regression
Clustering
2.Why do we need classification algorithms?
To handle each or some of the populations differently and improve business outcome
To predict a future value and prepare for it (resources, mitigations)
For cases where we don’t want to handle each data entity separately and would rather handle data entities in groups
3.Which of the following is NOT a potential problem with anomaly detection algorithms?
It might be hard to fully label all historical anomalies for training the algorithm
The algorithm might not be sensitive enough, thus missing anomalous elements
There might be too many categories (labels) for the algorithm to choose from
The algorithm might be too sensitive in identifying things as anomalous when they are not
4.Which of the following is NOT a prerequisite for applying an optimization algorithm?
Clear problem formulation: objective function and constraints
Historical data or ability to test alternatives
Enough compute/ knowledge how to use a heuristic approach
No labels are available (and we cannot create ones)
5.Which is true about applying AI to real business problems:
In many cases, there will be more than one approach, or we will apply a combination of algorithms to solve the problem
There is always just one correct method of AI that can be applied to solve the problem
You need to be an AI expert to be able to understand which potential methods can be applied for a problem
In most cases, you will need to apply advanced AI methods to solve the problem
Intro to data terminology Assessment
Intro to data terminology Assessment
1.How can one measure their data value?
Based on how many kilobytes of data we can predict its value
One method would be to measure it based on how much business impact it brings you
We can use information theory to subjectively let the data expert decide
2.Which of the following is NOT a data type mentioned in the video?
Cellular data
Audio data
Tabular data
Image data
3.Can the same dataset be used for both supervised and unsupervised learning?
No. Either you have labels, or you don’t
No. The supervised learning process will not converge if the data was already used for an unsupervised problem first.
Yes. It depends on the problem we are trying to solve. For different problems the same data field can be used as an attribute and in others as the output (goal).
4.What is a categorical data type?
It’s a discrete numerical value of an attribute
Sometimes called “nominal” – indicates that the data has a finite number of options
It is data collected every fixed time interval
Properties of “good” data for AI Assessment
Properties of “good” data for AI Assessment
1.Why should your data be both available AND accessible?
It is not enough to know the data is collected, i.e., available, but also that we can have continued access to it during POC and once productize
It shouldn’t it is enough that the data will be available somewhere, even if we don’t really have access to it.
Accessible data means that anyone can analyze it using any tool and available means that the data is not that big.
2.What is your responsibility considering data privacy and security?
It is not my responsibility. Ethical AI and infosec will find me and tell me what to do
Working on an AI idea, I’m responsible to be compliant with company's information security and responsible AI guidelines, as could be found on these program's websites.
If my data is secure, then by design, it is also ethical so I can worry only about that.
3.Which of the below will NOT be decided based on understanding the rate at which my business process changes in a way that will impact the AI algorithm’s quality:
Length of data history we can use
Can we solve the problem at all
How often we will probably need to retrain the algorithm
How often we will need to ingest new data
4.Which of the following statements is true with respect to data quantity?
The more, the better – you should always collect any data piece you can and use any historical data you ever collected.
ML algorithms always need more data than DL algorithms.
It highly depends on the problem and algorithm’s complexity as well as the performance needed
Overcoming main data issues Assessment
Overcoming main data issues Assessment
1.Which of the following is NOT a data issue discussed in the video?
Having irrelevant data, redundant or too much of it
Lack of enough labeled or quality labeled data
White data – having a dataset everyone can access which will impact your competitive advantage
Erroneous or noisy dataset
2.For which of the following data issues a feature selection is NOT one of the solution strategies?
Irrelevant/ redundant/ too much data
Sparse data/ curse of dimensionality
Dark data
3.What can we do when we have limited data (amount or information)?
Do an outlier’s detection
Collect more data, do a feature extraction, consider online or active learning algorithms, or walk away
Sample data to make it more balanced
4.Which of the following is INCORRECT with regards to dimensionality reduction?
It is always advised since it can help overcome any data issue
It can help overcome the “curse of dimensionality”
They are powerful techniques which should be used carefully so as not to distort the data
5.Which of the following is NOT a potential cause for having imbalanced/ biased data, as mentioned in the video?
Underlying distribution of elements in the real world is imbalanced
Lack of machine labeled data causes bias
Unintentional or inherent bias in the data collection
Anomalous behavior makes some of the instances to be rare
How to formulate an AI problem Assessment
How to formulate an AI problem Assessment
1.Why is proper problem formulation important?
It can uncover unreasonable expectations
It provides a focus & prevents costly misunderstandings
It might identify additional risks
All the above
2.Which of the following is NOT included in the “General Scheme of a well-defined problem for AI” presented in the video?
Cost of AI
Goals for the specific use case
Constraints
Tolerance
3.Which additional things are mentioned as ones you should consider during problem formulation?
How much compute is required
The translation of our ROI/ feasibility analysis to the formulated problem
How to productize the AI capability
How to gain stakeholders’ support
The different roles in an AI team | Assessment
The different roles in an AI team | Assessment
1.What are the primary responsibilities for a Subject Matter Expert within an AI team?
Work on the algorithm and make it production worthy
Business understanding, problem formulation and make sure the AI solution created has a high likelihood to succeed in the business ecosystem
Create a data pipeline that will manage the data effectively
2.What are the primary responsibilities for an AI PM within an AI team?
Coordinate the work of the AI team and the business, define the problem to be solved, and a product that utilizes the most relevant AI technology while maximizing the business impact
Create a state-of-the-art algorithm that can be published in a conference
Create a cost-effective AI platform, while including MLOPs best practices
3.What are the primary responsibilities for an “Active Sponsor” in support of the AI team?
Accountable for creating the integration code with the existing systems
Work in an agile methodology for efficient AI exploration
Remove any roadblock that might hinder the progress and allocate resources as needed
4.What are the primary responsibilities for a Data Scientist within an AI team?
Create production worthy code of an end-to-end AI capability
Create the best possible AI algorithm to solve the defined problem
Change management at the customer side to ensure AI is accepted with fewer objections
How to work on an AI POC | Assessment
How to work on an AI POC | Assessment
1.Which of the following are NOT part of the prerequisites for starting exploration phases?
Good enough data set
Preliminary solution approaches
Enough ROI and feasibility
An MLOPs platform
2.Which of the below best describes a “successful AI POC” according to the video:
Solves the problem
Provides a quick answer to “can we solve this problem with AI to our satisfaction?”
One that has an agile style “MVP” with accuracy >90%
One that includes the blueprint of an MLOPs platform
3.Which of the following is NOT a good reason to fail fast with your AI idea?
You don’t have (or have a clear line of site) to have the required data
Your problem seems unsolvable/ too complex for your needs and its impact
After applying judgement and consulted with experts, solution will be complex but ROI is high enough
Not enough support from key stakeholders
4.Which of the following is true about CRISP-DM:
One of the most well-proven methodologies for going about an AI exploration, published to standardize data mining processes across industries.
It has 6 fully linear and consecutive steps – each step is visited once to maximize results.
It cannot be used if you aim to apply DL algorithms since it was created before they became popular.
Deployment is often the easiest step of the process
5.For which cases do we suggest going for mocking the full flow first?
When we require deep-Turk methods, developed at Cornell.
Suited for complex projects with many risks or unknown factors. Or for cases where the human-algorithm interaction must be tested well to create the right capability.
When CRISP-DM fails, and we want to rescue the project.
When the team doesn’t have enough business acumen.
How to productize AI | Assessment
How to productize AI | Assessment
1.Which of the following is NOT true about set expectations regarding AI productization:
We need to do it only once well before we start the actual work
It is important since without it, the AI team or other stakeholders might make misinformed decisions
Scope, skills, required resources, and duration needs to be discussed
According to the video, AI productization often takes as long, if not longer than previous steps
2.An emphasis that before you productize an AI system you must test it in a controlled environment that is sterile like a lab.
An acronym of all the data sources planning and acquisitions activities to do towards productization.
A responsible AI practice for making sure your data is unbiased before you use it.
The name of all the different skills you need to have in an AI project team for productization.
3.What is NOT true about Proof Of Value (POV):
It proves that there is enough value from our solution approach.
It should have an end-to-end Simulation with a focus on risky implementation aspects.
Its goal is to create a high value, sustainable and robust solution.
You should discontinue the project when during the POV you have realized that the solution approach does not yield sufficient value.
The role of ML Engineer and MLOPs | Assessment
The role of ML Engineer and MLOPs | Assessment
1.What is the main barrier in AI towards delivering business value?
Lack of funding.
Difficulty Deploying into business processes/applications.
Proper planning/unreasonable expectations.
Lack of AI education.
2.What is the primary role of machine learning engineers?
Develop and optimize code.
Research and integrate state of the art open-source tools.
Build the machine learning workflows and infrastructure to maintain and productize AI models.
Define timelines, expectations, and deliverables for AI projects.
3.Which of the following are crucial for producing business value at scale with AI?
Tracking and sustaining models in production
ML model deployment
Integration into systems and business processes.
All of the above