Is it all about data science in the future?
Against the background that self-learning algorithms are getting better and better at processing data, the question arises whether data scientists will be needed today and in the future. Can't data sets also be prepared automatically for machine learning models? This is not only a concern for those outside the field who would like to use a machine learning solution and are considering whether they should simply pour their data directly into existing algorithms. Data scientists themselves and experts in related disciplines such as data analytics are also discussing what automated machine learning (AutoML) actually means for their future. Where are the limits and possibilities of automation?
Data scientists themselves like to use AutoML at least in proof-of-concept (PoC) phases, but that does not mean that algorithms threaten their job. It will most likely change just a little in the years to come. In order to get a clearer picture of the extent to which data science is already threatened with disruption by AutoML algorithms, it is worth taking a look at the individual tasks of a data scientist that he typically goes through in a project.
Understand users and respond to them
The first step in any data science project is to speak to the prospective user and understand where the problem is. This even applies to ML applications that can be implemented and used without adaptation. Before the data scientist has understood what the problem is or which process is to be optimized, no sensible solution can be offered. It is also important to later convey the resulting solutions and findings to the user in an understandable way. To do this, the following questions need to be answered:
Was the PoC successful?
What changes are recommended to make the forecast more accurate?
Where are the biggest bottlenecks in the processes?
It is the task of the data scientist to analyze and understand the affected company processes, including any implications (e.g. effects on other departments). This cognitive task is in the foreseeable future Impossible to automate.
Before the exciting data science tasks can really start, the data must first be put into a usable state. This also means entering into a dialogue with the user or customer. It is important to agree on a form of data access, to connect to the different source systems, to link the data and, above all, to filter it intensively. These steps are after all partly automatable. In particular, loading from many different data sources has become significantly less complicated in recent years. However, manual effort is required because human understanding is needed to understand which data is stored where and what it means. The same applies to the linking of the data.
Impossible to automate on the other hand, there is the filtering of the data, or the so-called plausibility check. For the success of a project, it is fundamentally important to check the data to ensure that it meets the expected specifications. Data scientists know from experience: They never do that. Sensors do not always work reliably, stamping times almost always have quality problems, real orders are mixed up with orders on the extended workbench or end customers are labeled "second reminder" even though they have never received an invoice or even used a free service.
Most of these data errors cannot be automatically detected because an algorithm lacks the context for this assessment. Everyone understands at first glance that a customer cannot be warned if he does not have to pay in the first place. Someone would have to teach this rule to an algorithm first.
So-called feature engineering is about processing the raw data in such a way that the ML algorithm can understand it as well as possible. It should be as easy as possible for him to extract all the information that is hidden in the data set. Suppose someone wants to predict how successful a movie will be. The names or IDs of the individual actors in each film are known. However, the ML algorithm can do little with these IDs. At most, he would remember the IDs of a few top performers whose participation every film would be a success. A data scientist is able to enrich the information about the actors significantly through feature engineering.
What gender and age are the actors? How successful have the last few films you starred in, both monetarily and critically? These and many other factors enable the algorithm to understand whether it is working on a real blockbuster, an art house project or a completely different genre.
Some simpler feature engineering tasks are already pretty well automated (one hot encoding, imputation, etc.). However, these are not the steps with which the quality of the models can be significantly improved. It is much more important to understand the processes behind the data and to incorporate this knowledge into feature engineering. This data enrichment, in combination with data consolidation, is what data scientists spend around 80 percent of their time with and what enables them to generate the greatest added value. Understanding the processes of the user and the quality of the data and making this knowledge usable algorithmically Can only be automated to a very small extent.
- What if Serbia declared war on Montenegro?
- How long do most Porsche Boxsters last
- Where can you buy Andrew Weil's hat
- Is energy more important than our life
- Where can I sell real counterfeit money
- What is congress
- Can I play chess with myself
- Einstein had two theories of relativity
- Well awakens your third eye
- If NSE falls under the state sector
- Why is hitchhiking illegal?
- What is the cleanest burning candle
- Is luck good for productivity
- Are horses related to rodents
- In general, what is it about
- What are the black night UFOs
- How is Krupanidhi Degree College doing
- Apple is the richest company
- What is IIM Ahmedabad
- What are still unknown highlights in Croatia
- App development is a popular career
- What is 10 1 10 1 1
- Are reputable universities worth the cost?
- What is the full form of D M.