Position: 吳俊逸 > AI
Principles of Data Science
by 吳俊逸 2020-04-28 22:23:46, Reply(0), Views(254)

Q1: When we extract the data and found that there is an error in the data and we disposing the data, do we still count this data as one of the null sample or considering this as an outlier? or just ditch it completely?

Ans: Perform EDA. First try to understand the percentage of incorrect data. If the percentage is small enough and the data is with small impact, an easy way is to ditch them. 


Q2: Is that true that all data is only row & column after processing?

Ans: It's easier for human beings to comprehend 2D so often times matrices are used. However, row & column may not work the best for unstructured data that include pictures or time-series of data.


Q3: Could you give an example of decision tree pruning and its application. Thank you

Ans: After a tree has been built, it may be overfitted. When a leave consists of only one or few data points, the tree has learnt the data. There is no point to keep the leaf that may cause inaccurate result for new data. An example is: the tree = above 40C indicates fever, then the leaf node of 39.001C for few data refers to fever or not is not meaningful and should be pruned.


Q4: How is the SVM determined?

Ans: SVM is an effective algorithm for classification tasks. Nonetheless, choice of algorithms depends on a case-by-case basis.


Q5: Can you show one real world  use case use unsupervised.And which data to input.,  or input come from data that supervised ? thanks.

Ans:To group together people with similar traits that have high risk of suicide.


Q6: Could you explain variable please?

Ans:Variable is a symbolic name associated with a value and whose associated value may change. In the area of data modeling, variables refers to features, e.g. age, temperature, or price.


Q7: What does it mean to have a labeled data in unsupervised learning?

Ans:Supervised learning => labeled data; unsupervised learning=> no label.


Q8: Is SVM only classification data into 2 type ?

Ans:SVM was originally designed for binary classification. Multi-classification could be constructed from combining several binary classification.


Q9: How different between supervised classification and unsupervised clustering?

Ans:Classification and clustering are two different tasks. The former is associated with categories while the latter is for grouping.


Q10: Can you explain a little bit more on what exactly is SVM? Because you mentioned the red line a lot. Is it the method in which we set rules to set boundary for a classification?

Ans:Yes, when we find a boundary that separates two categories, we are done with classification task.


Q11: What is the different function between "Input Gate" and "RNN"?

Ans:RNN is time-series deep learning model. Input gate is used in LSTM and refers to the control for inputs.


Q12: In Deep learning NN, Teacher said that 1'st hidden may learn color, 2nd hidden may learn or decision other  parameters. How to control that?

Ans:Deep learning is a black-box approach so it is not possible to control the function of each layer. XAI is a new academic research that aims to understand the effect of each layer, but it is still ongoing.


Q13: Take your award winning case for illegal Ads recognition AI, can you share with us how you justify the computing resources needed like CPU cores, memory size or even GPU cores needed?

Ans:It was a 5-layer LSTM so the computing resource is minimal. A notebook would be sufficient to complete the training.


Q14: Can you give examples or situations where data are unlabled?

Ans:A million pictures without label can only be grouped by similarity; a million pictures with labeled category can be used for classification model training.