AI & ML & Data analysis Questions - 吳俊逸的數位歷程檔

個人資訊

吳俊逸

訪客 (637482), 推薦 (3)

文章 (377)

回應 (1)

文章分類

Trademark (11)

Industry 4.0 (32)

NLP (10)

AI (46)

Patent (110)

Course (20)

Supply chain (44)

uncategorized (38)

最新文章

愛爾蘭半導體產業 (12-24)

AI & ML & Data analysis Questions (08-25)

什麼是半導體？ (07-25)

什麼是O Visa? 如何申請？ (05-28)

AI Topics (11-16)

常用連結

位置: 吳俊逸 > AI

AI & ML & Data analysis Questions

by 吳俊逸

2023-08-25 23:39:27, 回應(0), 人氣(349)

Basic AI Terminology | Assessment

Differentiate between AI to ML to DL; Discuss Analytics to AI evolution. What is Big Data and its relevance to AI; Discuss why AI gets so much attention and compare ML with DL.

What is the relationship between Machine Learning (ML) and Artificial Intelligence (AI)?

All AI is performed using ML algorithms only

ML is a subcategory of AI

No connection

What size of data do you need for AI?

Big data

Image data

Proportional to algorithm and problem complexity

Why is Deep Learning important?

It solves some problems, mainly sensory ones in quality that outperforms humans’ ability

All problems are suitable for and best solved by DL

It uses massive amounts of data and highly interpretable

How ML algorithms work | Assessment

How ML algorithms work assessment.

All ML algorithms work in the following order:

Training -> inference -> data collection

Data collection-> inference -> training

Data collection -> training -> inference

What is the goal of the training phase of an ML algorithm?

To create a robust and general enough model of the problem it solves so it will be good with unseen data

To memorize all training data examples so it will be good with data it has already seen

To select which algorithm will use the most advanced HW and algorithms combination

If our algorithm doesn’t perform well in the inference step, what should we do?

Wait until it gets better autonomously

Go back to improve the training phase and if that doesn’t work go back to improve the data we use

This means we will need to collect big data since otherwise nothing will work

Why is quality data collection important for ML?

Its overrated. algorithms will overcome any data issue

With ML algorithms: It’s “garbage in garbage out”. The algorithm is limited by the quality of data we feed it.

When we perform the inference step we teach the algorithm to be good with unseen data

Different types of AI tasks with examples | Assessment

Different types of AI tasks with examples assessment.

Classification algorithms:

Are a type of supervised learning algorithms designed to identify to which category each data instance belongs to.

Require no labels and are used for online learning

The only type of ML we use at company

Reinforcement Learning algorithms:

Reinforce human conceptions about the data labels

Mimic how humans learn new tasks through a smart trial and error

Can be used for supervised and unsupervised regression

Regression is mainly used for:

To identify anomalies in images from the fab

To process text in company's customers tickets

Predicting the value of a continuous variable based on its input variables

Unsupervised learning:

Is often required since we don't have labeled data

Can be done by junior employees since they don’t need supervision

Is not good for solving vision problems since it uses only complex algorithms

Mapping between business challenges and AI challenges_Assessment

Which of the following AI tasks was NOT mentioned in the video?

Classification

Self-supervised learning

Regression

Clustering

Why do we need classification algorithms?

To handle each or some of the populations differently and improve business outcome

To predict a future value and prepare for it (resources, mitigations)

For cases where we don’t want to handle each data entity separately and would rather handle data entities in groups

Which of the following is NOT a potential problem with anomaly detection algorithms?

It might be hard to fully label all historical anomalies for training the algorithm

The algorithm might not be sensitive enough, thus missing anomalous elements

There might be too many categories (labels) for the algorithm to choose from

The algorithm might be too sensitive in identifying things as anomalous when they are not

Which of the following is NOT a prerequisite for applying an optimization algorithm?

Clear problem formulation: objective function and constraints

Historical data or ability to test alternatives

Enough compute/ knowledge how to use a heuristic approach

No labels are available (and we cannot create ones)

Which is true about applying AI to real business problems:

In many cases, there will be more than one approach, or we will apply a combination of algorithms to solve the problem

There is always just one correct method of AI that can be applied to solve the problem

You need to be an AI expert to be able to understand which potential methods can be applied for a problem

In most cases, you will need to apply advanced AI methods to solve the problem

Intro to data terminology Assessment

How can one measure their data value?

Based on how many kilobytes of data we can predict its value

One method would be to measure it based on how much business impact it brings you

We can use information theory to subjectively let the data expert decide

Which of the following is NOT a data type mentioned in the video?

Cellular data

Audio data

Tabular data

Image data

Can the same dataset be used for both supervised and unsupervised learning?

No. Either you have labels, or you don’t

No. The supervised learning process will not converge if the data was already used for an unsupervised problem first.

Yes. It depends on the problem we are trying to solve. For different problems the same data field can be used as an attribute and in others as the output (goal).

What is a categorical data type?

It’s a discrete numerical value of an attribute

Sometimes called “nominal” – indicates that the data has a finite number of options

It is data collected every fixed time interval

Properties of “good” data for AI Assessment

Why should your data be both available AND accessible?

It is not enough to know the data is collected, i.e., available, but also that we can have continued access to it during POC and once productize

It shouldn’t it is enough that the data will be available somewhere, even if we don’t really have access to it.

Accessible data means that anyone can analyze it using any tool and available means that the data is not that big.

What is your responsibility considering data privacy and security?

It is not my responsibility. Ethical AI and infosec will find me and tell me what to do

Working on an AI idea, I’m responsible to be compliant with company's information security and responsible AI guidelines, as could be found on these program's websites.

If my data is secure, then by design, it is also ethical so I can worry only about that.

Which of the below will NOT be decided based on understanding the rate at which my business process changes in a way that will impact the AI algorithm’s quality:

Length of data history we can use

Can we solve the problem at all

How often we will probably need to retrain the algorithm

How often we will need to ingest new data

Which of the following statements is true with respect to data quantity?

The more, the better – you should always collect any data piece you can and use any historical data you ever collected.

ML algorithms always need more data than DL algorithms.

It highly depends on the problem and algorithm’s complexity as well as the performance needed

Overcoming main data issues Assessment

Which of the following is NOT a data issue discussed in the video?

Having irrelevant data, redundant or too much of it

Lack of enough labeled or quality labeled data

White data – having a dataset everyone can access which will impact your competitive advantage

Erroneous or noisy dataset

For which of the following data issues a feature selection is NOT one of the solution strategies?

Irrelevant/ redundant/ too much data

Sparse data/ curse of dimensionality

Dark data

What can we do when we have limited data (amount or information)?

Do an outlier’s detection

Collect more data, do a feature extraction, consider online or active learning algorithms, or walk away

Sample data to make it more balanced

Which of the following is INCORRECT with regards to dimensionality reduction?

It is always advised since it can help overcome any data issue

It can help overcome the “curse of dimensionality”

They are powerful techniques which should be used carefully so as not to distort the data

Which of the following is NOT a potential cause for having imbalanced/ biased data, as mentioned in the video?

Underlying distribution of elements in the real world is imbalanced

Lack of machine labeled data causes bias

Unintentional or inherent bias in the data collection

Anomalous behavior makes some of the instances to be rare

How to formulate an AI problem Assessment

Why is proper problem formulation important?

It can uncover unreasonable expectations

It provides a focus & prevents costly misunderstandings

It might identify additional risks

All the above

Which of the following is NOT included in the “General Scheme of a well-defined problem for AI” presented in the video?

Cost of AI

Goals for the specific use case

Constraints

Tolerance

Which additional things are mentioned as ones you should consider during problem formulation?

How much compute is required

The translation of our ROI/ feasibility analysis to the formulated problem

How to productize the AI capability

How to gain stakeholders’ support

The different roles in an AI team | Assessment

What are the primary responsibilities for a Subject Matter Expert within an AI team?

Work on the algorithm and make it production worthy

Business understanding, problem formulation and make sure the AI solution created has a high likelihood to succeed in the business ecosystem

Create a data pipeline that will manage the data effectively

What are the primary responsibilities for an AI PM within an AI team?

Coordinate the work of the AI team and the business, define the problem to be solved, and a product that utilizes the most relevant AI technology while maximizing the business impact

Create a state-of-the-art algorithm that can be published in a conference

Create a cost-effective AI platform, while including MLOPs best practices

What are the primary responsibilities for an “Active Sponsor” in support of the AI team?

Accountable for creating the integration code with the existing systems

Work in an agile methodology for efficient AI exploration

Remove any roadblock that might hinder the progress and allocate resources as needed

What are the primary responsibilities for a Data Scientist within an AI team?

Create production worthy code of an end-to-end AI capability

Create the best possible AI algorithm to solve the defined problem

Change management at the customer side to ensure AI is accepted with fewer objections

How to work on an AI POC | Assessment

Which of the following are NOT part of the prerequisites for starting exploration phases?

Good enough data set

Preliminary solution approaches

Enough ROI and feasibility

An MLOPs platform

Which of the below best describes a “successful AI POC” according to the video:

Solves the problem

Provides a quick answer to “can we solve this problem with AI to our satisfaction?”

One that has an agile style “MVP” with accuracy >90%

One that includes the blueprint of an MLOPs platform

Which of the following is NOT a good reason to fail fast with your AI idea?

You don’t have (or have a clear line of site) to have the required data

Your problem seems unsolvable/ too complex for your needs and its impact

After applying judgement and consulted with experts, solution will be complex but ROI is high enough

Not enough support from key stakeholders

Which of the following is true about CRISP-DM:

One of the most well-proven methodologies for going about an AI exploration, published to standardize data mining processes across industries.

It has 6 fully linear and consecutive steps – each step is visited once to maximize results.

It cannot be used if you aim to apply DL algorithms since it was created before they became popular.

Deployment is often the easiest step of the process

For which cases do we suggest going for mocking the full flow first?

When we require deep-Turk methods, developed at Cornell.

Suited for complex projects with many risks or unknown factors. Or for cases where the human-algorithm interaction must be tested well to create the right capability.

When CRISP-DM fails, and we want to rescue the project.

When the team doesn’t have enough business acumen.

How to productize AI | Assessment

Which of the following is NOT true about set expectations regarding AI productization:

We need to do it only once well before we start the actual work

It is important since without it, the AI team or other stakeholders might make misinformed decisions

Scope, skills, required resources, and duration needs to be discussed

According to the video, AI productization often takes as long, if not longer than previous steps

What is “LAB RAT”?

An emphasis that before you productize an AI system you must test it in a controlled environment that is sterile like a lab.

An acronym of all the data sources planning and acquisitions activities to do towards productization.

A responsible AI practice for making sure your data is unbiased before you use it.

The name of all the different skills you need to have in an AI project team for productization.

What is NOT true about Proof Of Value (POV):

It proves that there is enough value from our solution approach.

It should have an end-to-end Simulation with a focus on risky implementation aspects.

Its goal is to create a high value, sustainable and robust solution.

You should discontinue the project when during the POV you have realized that the solution approach does not yield sufficient value.

The role of ML Engineer and MLOPs | Assessment

What is the main barrier in AI towards delivering business value?

Lack of funding.

Difficulty Deploying into business processes/applications.

Proper planning/unreasonable expectations.

Lack of AI education.

What is the primary role of machine learning engineers?

Develop and optimize code.

Research and integrate state of the art open-source tools.

Build the machine learning workflows and infrastructure to maintain and productize AI models.

Define timelines, expectations, and deliverables for AI projects.

Which of the following are crucial for producing business value at scale with AI?

Tracking and sustaining models in production

ML model deployment

Integration into systems and business processes.

All of the above