Position: 吳俊逸 > AI
AI & ML & Data analysis Questions
by 吳俊逸 2023-08-25 23:39:27, Reply(0), Views(200)
Basic AI Terminology | Assessment

Differentiate between AI to ML to DL; Discuss Analytics to AI evolution. What is Big Data and its relevance to AI; Discuss why AI gets so much attention and compare ML with DL.


What is the relationship between Machine Learning (ML) and Artificial Intelligence (AI)?

All AI is performed using ML algorithms only
ML is a subcategory of AI
No connection

What size of data do you need for AI?

Big data
Image data
Proportional to algorithm and problem complexity

Why is Deep Learning important?

It solves some problems, mainly sensory ones in quality that outperforms humans’ ability
All problems are suitable for and best solved by DL
It uses massive amounts of data and highly interpretable

How ML algorithms work | Assessment

How ML algorithms work assessment.


All ML algorithms work in the following order:

Training -> inference -> data collection
Data collection-> inference -> training
Data collection -> training -> inference

What is the goal of the training phase of an ML algorithm?

To create a robust and general enough model of the problem it solves so it will be good with unseen data
To memorize all training data examples so it will be good with data it has already seen
To select which algorithm will use the most advanced HW and algorithms combination

If our algorithm doesn’t perform well in the inference step, what should we do?

Wait until it gets better autonomously
Go back to improve the training phase and if that doesn’t work go back to improve the data we use 
This means we will need to collect big data since otherwise nothing will work

Why is quality data collection important for ML?

Its overrated. algorithms will overcome any data issue
With ML algorithms: It’s “garbage in garbage out”. The algorithm is limited by the quality of data we feed it.
When we perform the inference step we teach the algorithm to be good with unseen data

Different types of AI tasks with examples | Assessment

Different types of AI tasks with examples assessment.


Classification algorithms:

Are a type of supervised learning algorithms designed to identify to which category each data instance belongs to.
Require no labels and are used for online learning
The only type of ML we use at company

Reinforcement Learning algorithms:

Reinforce human conceptions about the data labels
Mimic how humans learn new tasks through a smart trial and error
Can be used for supervised and unsupervised regression

Regression is mainly used for:

To identify anomalies in images from the fab
To process text in company's customers tickets
Predicting the value of a continuous variable based on its input variables

Unsupervised learning:

Is often required since we don't have labeled data
Can be done by junior employees since they don’t need supervision
Is not good for solving vision problems since it uses only complex algorithms

Mapping between business challenges and AI challenges_Assessment

Mapping between business challenges and AI challenges_Assessment


Which of the following AI tasks was NOT mentioned in the video?

Self-supervised learning

Why do we need classification algorithms?

To handle each or some of the populations differently and improve business outcome
To predict a future value and prepare for it (resources, mitigations)
For cases where we don’t want to handle each data entity separately and would rather handle data entities in groups

Which of the following is NOT a potential problem with anomaly detection algorithms?

It might be hard to fully label all historical anomalies for training the algorithm
The algorithm might not be sensitive enough, thus missing anomalous elements
There might be too many categories (labels) for the algorithm to choose from
The algorithm might be too sensitive in identifying things as anomalous when they are not

Which of the following is NOT a prerequisite for applying an optimization algorithm?

Clear problem formulation: objective function and constraints
Historical data or ability to test alternatives
Enough compute/ knowledge how to use a heuristic approach
No labels are available (and we cannot create ones)

Which is true about applying AI to real business problems:

In many cases, there will be more than one approach, or we will apply a combination of algorithms to solve the problem
There is always just one correct method of AI that can be applied to solve the problem
You need to be an AI expert to be able to understand which potential methods can be applied for a problem
In most cases, you will need to apply advanced AI methods to solve the problem

Intro to data terminology Assessment

Intro to data terminology Assessment


How can one measure their data value? 

Based on how many kilobytes of data we can predict its value 
One method would be to measure it based on how much business impact it brings you 
We can use information theory to subjectively let the data expert decide 

Which of the following is NOT a data type mentioned in the video? 

Cellular data 
Audio data 
Tabular data 
Image data 

Can the same dataset be used for both supervised and unsupervised learning? 

No. Either you have labels, or you don’t 
No. The supervised learning process will not converge if the data was already used for an unsupervised problem first. 
Yes. It depends on the problem we are trying to solve. For different problems the same data field can be used as an attribute and in others as the output (goal).    

What is a categorical data type? 

It’s a discrete numerical value of an attribute 
Sometimes called “nominal” – indicates that the data has a finite number of options 
It is data collected every fixed time interval 

Properties of “good” data for AI Assessment

Properties of “good” data for AI Assessment


Why should your data be both available AND accessible? 

It is not enough to know the data is collected, i.e., available, but also that we can have continued access to it during POC and once productize
It shouldn’t it is enough that the data will be available somewhere, even if we don’t really have access to it.
Accessible data means that anyone can analyze it using any tool and available means that the data is not that big.

What is your responsibility considering data privacy and security?

It is not my responsibility. Ethical AI and infosec will find me and tell me what to do
Working on an AI idea, I’m responsible to be compliant with company's information security and responsible AI guidelines, as could be found on these program's websites.
If my data is secure, then by design, it is also ethical so I can worry only about that.

Which of the below will NOT be decided based on understanding the rate at which my business process changes in a way that will impact the AI algorithm’s quality:

Length of data history we can use
Can we solve the problem at all
How often we will probably need to retrain the algorithm
How often we will need to ingest new data

Which of the following statements is true with respect to data quantity?

The more, the better – you should always collect any data piece you can and use any historical data you ever collected.
ML algorithms always need more data than DL algorithms.
It highly depends on the problem and algorithm’s complexity as well as the performance needed

Overcoming main data issues Assessment

Overcoming main data issues Assessment


Which of the following is NOT a data issue discussed in the video?

Having irrelevant data, redundant or too much of it
Lack of enough labeled or quality labeled data
White data – having a dataset everyone can access which will impact your competitive advantage
Erroneous or noisy dataset

For which of the following data issues a feature selection is NOT one of the solution strategies?

Irrelevant/ redundant/ too much data
Sparse data/ curse of dimensionality
Dark data

What can we do when we have limited data (amount or information)?

Do an outlier’s detection
Collect more data, do a feature extraction, consider online or active learning algorithms, or walk away
Sample data to make it more balanced

Which of the following is INCORRECT with regards to dimensionality reduction?

It is always advised since it can help overcome any data issue
It can help overcome the “curse of dimensionality”
They are powerful techniques which should be used carefully so as not to distort the data

Which of the following is NOT a potential cause for having imbalanced/ biased data, as mentioned in the video?

Underlying distribution of elements in the real world is imbalanced
Lack of machine labeled data causes bias
Unintentional or inherent bias in the data collection
Anomalous behavior makes some of the instances to be rare

How to formulate an AI problem Assessment

How to formulate an AI problem Assessment


Why is proper problem formulation important?  

It can uncover unreasonable expectations 
It provides a focus & prevents costly misunderstandings 
It might identify additional risks 
All the above  

Which of the following is NOT included in the “General Scheme of a well-defined problem for AI” presented in the video? 

Cost of AI 
Goals for the specific use case 

Which additional things are mentioned as ones you should consider during problem formulation? 

How much compute is required 
The translation of our ROI/ feasibility analysis to the formulated problem 
How to productize the AI capability 
How to gain stakeholders’ support

The different roles in an AI team | Assessment

The different roles in an AI team | Assessment


What are the primary responsibilities for a Subject Matter Expert within an AI team?

Work on the algorithm and make it production worthy
Business understanding, problem formulation and make sure the AI solution created has a high likelihood to succeed in the business ecosystem
Create a data pipeline that will manage the data effectively

What are the primary responsibilities for an AI PM within an AI team?

Coordinate the work of the AI team and the business, define the problem to be solved, and a product that utilizes the most relevant AI technology while maximizing the business impact
Create a state-of-the-art algorithm that can be published in a conference
Create a cost-effective AI platform, while including MLOPs best practices

What are the primary responsibilities for an “Active Sponsor” in support of the AI team?

Accountable for creating the integration code with the existing systems
Work in an agile methodology for efficient AI exploration
Remove any roadblock that might hinder the progress and allocate resources as needed


What are the primary responsibilities for a Data Scientist within an AI team?

Create production worthy code of an end-to-end AI capability
Create the best possible AI algorithm to solve the defined problem
Change management at the customer side to ensure AI is accepted with fewer objections

How to work on an AI POC | Assessment

How to work on an AI POC | Assessment  


Which of the following are NOT part of the prerequisites for starting exploration phases?

Good enough data set
Preliminary solution approaches
Enough ROI and feasibility
An MLOPs platform

Which of the below best describes a “successful AI POC” according to the video:

Solves the problem
Provides a quick answer to “can we solve this problem with AI to our satisfaction?”
One that has an agile style “MVP” with accuracy >90%
One that includes the blueprint of an MLOPs platform

Which of the following is NOT a good reason to fail fast with your AI idea?

You don’t have (or have a clear line of site) to have the required data
Your problem seems unsolvable/ too complex for your needs and its impact
After applying judgement and consulted with experts, solution will be complex but ROI is high enough
Not enough support from key stakeholders

Which of the following is true about CRISP-DM:

One of the most well-proven methodologies for going about an AI exploration, published to standardize data mining processes across industries.
It has 6 fully linear and consecutive steps – each step is visited once to maximize results.
It cannot be used if you aim to apply DL algorithms since it was created before they became popular.
Deployment is often the easiest step of the process

For which cases do we suggest going for mocking the full flow first?

When we require deep-Turk methods, developed at Cornell.
Suited for complex projects with many risks or unknown factors. Or for cases where the human-algorithm interaction must be tested well to create the right capability.
When CRISP-DM fails, and we want to rescue the project.
When the team doesn’t have enough business acumen.

How to productize AI | Assessment

How to productize AI | Assessment


Which of the following is NOT true about set expectations regarding AI productization:

We need to do it only once well before we start the actual work
It is important since without it, the AI team or other stakeholders might make misinformed decisions  
Scope, skills, required resources, and duration needs to be discussed
According to the video, AI productization often takes as long, if not longer than previous steps 

What is “LAB RAT”?

An emphasis that before you productize an AI system you must test it in a controlled environment that is sterile like a lab.
An acronym of all the data sources planning and acquisitions activities to do towards productization.
A responsible AI practice for making sure your data is unbiased before you use it.
The name of all the different skills you need to have in an AI project team for productization.

What is NOT true about Proof Of Value (POV): 

It proves that there is enough value from our solution approach.
It should have an end-to-end Simulation with a focus on risky implementation aspects.
Its goal is to create a high value, sustainable and robust solution.
You should discontinue the project when during the POV you have realized that the solution approach does not yield sufficient value.

The role of ML Engineer and MLOPs | Assessment

The role of ML Engineer and MLOPs | Assessment


What is the main barrier in AI towards delivering business value? 

Lack of funding. 
Difficulty Deploying into business processes/applications. 
Proper planning/unreasonable expectations. 
Lack of AI education. 

What is the primary role of machine learning engineers? 

Develop and optimize code. 
Research and integrate state of the art open-source tools. 
Build the machine learning workflows and infrastructure to maintain and productize AI models. 
Define timelines, expectations, and deliverables for AI projects. 

Which of the following are crucial for producing business value at scale with AI? 

Tracking and sustaining models in production 
ML model deployment 
Integration into systems and business processes. 
All of the above