Staring at the Wall

When I received news, my parent had stage four cancer.. “Staring at the Wall” is published by JulieC.Lara in An Idea (by Ingenious Piece).

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




An actionable and structured framework for data science projects

Most often than not big data and data science projects which probably would have taken months for development hardly make it to the production phase and get shelved off in the experimentation phase. According to Venturebeat, and Gartner, 87% of the data science projects and 85% of the big data projects never make it to the production phase. What is more frightening is that according to a recent Gartner report, only between 15% and 20% of data science projects get completed. Of those projects that did complete, CEOs say that only about 8% of them generate value.

There might be multiple reasons why a data science-driven project does not make it to the end. Some of the most common problems include:

How a structured framework helps in building successful data science projects?

80–85% of the projects fail before completion and there is a further drop off when the project implemented does not deliver value

Having a well-defined framework that all the data science teams working on multiple projects follow will help the teams with higher efficiency, brings in greater visibility, and also sees that all the projects have gone through certain stages before being brought into production. We for our data science teams in our organization had come up with a framework keeping the following objectives in mind:

Considering the above objectives in mind, we would need to develop a standardized framework that all the data science projects can use. It should consist of multiple stages and each stage could consist of certain questions that ought to be kept in mind by the people working on in that stage which can be keyed in when the team is done with that particular stage. The main purpose of the questions is not prescriptive(which cannot be generalized)instead the goal would be to record some key steps and decisions taken, the reasoning behind them, and some key results obtained in each stage.

This really amazing flowchart shared in the Microsoft build documentation shows the data science life cycle beautifully.

We in the organization have created a framework on Flock and JIRA that consists of all the stages that a data science project has to go through and each stage consists of a questionnaire that the folks that are responsible for that stage need to key in at the end/completion of that stage(in terms of development). The lifecycle of a DS project was broadly divided into 6 phases:

1.Business and requirement understanding: This phase comes under the discovery stage which is an extremely crucial stage. More often than not this stage is overlooked or not enough time is spent here. Not having a clear requirement understanding essentially means not knowing the purpose of the project which would result in not knowing how to use the results to add business value. This may lead to very high chances of the project getting shelved after the experimentation phase. Hence we have proposed 4 sub-stages under this stage-

2.Data and strategy: This phase comes under the implementation stage where we have stages like data strategy, Exploratory data analysis, feature transformation, and preparation.In this phase, we get a deep and extensive understanding of our data and make them ready for the model building phase.

3.Model Building: This is the core or meat of any data science project where we tune and train the model bettering a chosen evaluation metric.

4.Testing: In the testing phase we test the tuned and trained model with a completely new and unseen dataset. This gives us an idea of how the model is going to perform with real-world datasets. The testing phase can be divided into two stages model testing and historical data testing

5.Scoping and developing an activation and validation strategy: We scope out an activation and validation strategy in this stage. We find answers to some of the most important questions like how do we use the model results which should also have been discussed in the business understanding stage. We also scope how we would measure the business value on the agreed success metrics, the buffer period after predictions to do the validation, how we would automate the training and testing pipeline, etc.

6.Project Review: This is the project end questionnaire which has questions like identifying some of the challenges that were unaddressed or need further exploration, scoping out next steps, whether the project receives a signoff from all its stakeholders, etc. This is a very important step as it not only shows to what extent the project was successful but also allows us to plan our next steps.

In summary, having a structured framework that is used across all the data science teams will ideally increase the success rate of data science projects, improve the visibility across teams as we get to know the details of what each team is working on, makes the troubleshooting much easier. Apart from this, it would mean having consolidated and structured documentation and record of the project.

Add a comment

Related posts:

Configuring AWS EC2 Instance Using CLI

AWS is a public cloud service provider that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments on a metered pay-as-you-go basis. AWS has a huge range of…

Inching Closer to Decade Number Three

Aging is surreal. After you turn 25, things start moving faster and faster. Aging never stops. It keeps creeping up on you, reminding you to live life, do you, stay on track, and make sure every…

3 Simple Ideas That Will Make It Easy To Write Your Book Now

By now you know that a book is one of the most powerful tools for you as an entrepreneur. It not only establishes your expertise in your market, but it also helps to expand your audience. In…