Machine Learnig and First Phase

Why understand the problem is important?

by Erico Couto Jr.

I have seen many graphcs of ML workflow without indication about the problem to be solved. How can we solve something without knowing it? Workflow need to be complete and be designed with all phases, principally starting by de problem. So, this article will warns of the need to treat the problem as an important phase, because without this phase there will be no solution, or any solution will do.

Well, in order to have a project with a clear objective, we need to understand the problem and ask some questions. I like to use the 5W2H, in a very practical and objective way, without complex spreadsheets or endless meetings. We should use 5W as a brainstorm to get a general understanding of the problem and leave 2H to the end. This technique works very well, even for companies that need agile processes. If this initial process is neglected, the “W” will appear during the ETL and modeling process. But this is a tip for those who already work with managing short-term projects.

But going back to the workflow subject, I notice that most workflow diagrams do not have the problem understanding phase. As mentioned earlier, this phase cannot be overlooked as it is mandatory in any project.

Let’s test the 5W first !!!

first: What is the problem?

second: Why does it happen?

third: When does this happen?

fourth: Where does this happen?

fifth: Who causes it?

If you answered the five questions, you certainly came to know the problem, if you answered these five questions, you’ve certainly come to know the problem, and you will can continue with the work flow process and understend 2H (How does this happen? And How much does this problem cost?)

These last two questions must be analyzed separately. You will probably discover how the problem occurs when analyzing the data, this will not happen before the ETL and data analysis phases. After analyzing the data and its relationships you can have some insight. Until now, you’ve only done data analysis work. Good and important, but ML is much more then data analysis.

In the following phases, we can follow the publication by Ayush Pant entitled Work Flow of a Machine Learning Project. He has discus the workflow a Machine learning project and gives us a basic idea of how a should the problem be tackled.

This diagram is a good example of the process flow and was presented by Antônio Figuereido in his publication Artificial Intelligence & Machine Learning: A Primer


Arquiteto Engenheiro, especialista em Analista de Dados, Avaliação, Auditoria, Patologias e Ciência de Dados