We are your data science workshop.

Onboarding a data scientist

Hiring a new data scientist into your team can be a very exciting time. The right candidate can provide new insight to your organisation, automate time consuming tasks, and help to transform decision making to become more data driven. As your new team member’s start date approaches, you might start to think about how to best onboard them. In other words, what can you do, as a manager, to help them get up to speed and start adding value to your team?

Challenges

Integrating a new data scientist into your organisation may not be straightforward for several reasons:

  • You don’t understand enough of what they do to know what they need.
  • The role itself is often more open and flexible.
  • The data scientist’s background can range widely from engineering, mathematics to computer science, and be quite varied in prior experience.
  • Their day to day work may be different depending on a number of factors, such as whether the company has more or less structured processes, whether it is a consultancy or product company, and whether the pace of work is fast or slow.

In this article, we provide a wide range of suggestions to design an onboarding process that considers the work environment and the data scientist’s background. There are obvious things one should do, such as introducing them to the immediate and wider team and setting up any system accesses they require. These are steps that you are likely to take for any new starter, and we will not cover them in this article. We focus on onboarding actions that would help a data scientist specifically.

Have them design their own training

Chances are that you have hired someone you think is smart, has programming skills and prior experience in solving problems using data. But they may not be familiar with the specific problems in your industry, or maybe they haven’t been using the specific modelling technique that is commonly used in your industry, or perhaps they have previously coded in a different programming language. These are not showstoppers to them performing well in the role, and can easily be addressed with a good onboarding/training process.

By nature of the role, many data scientists either have a research background or are experienced with some form of research. What this means is that they should be able to identify the gaps in their knowledge, and effectively look for ways to learn the things they don’t know. Have them take charge of their training. This will cater for their individual background and prior experience. For example, someone with a lot of programming experience might want to spend less time learning new packages and software, and focus more on learning new mathematical concepts. Likewise, someone with a strong statistics or mathematics background might want to spend more time on programming material. Furthermore, they may already have a preference for their approach to learning new skills – some people learn best by doing, some people prefer reading conceptual material, and others benefit more from watching video courses.

Learning is most effective when sufficiently spaced out. If it is feasible as part of the onboarding process, suggest that your data scientist spend some time every day on a training resource of their choice. This could be books, research articles, video lectures, industry workshops, and industry documentation, for example.

Assign them a small first project

Since a data scientist’s job will involve a fair amount of programming, a good onboarding activity is to give them a small easy programming task. Consider whether to choose a task that has time constraints associated with it or not. There are advantages and disadvantages to both, and the choice will depend on the company’s situation. If the work environment is more fast paced, then giving them a task that fits into the team’s day-to-day work will be immediately useful. The time constraints will mimic the real work they are expected to perform, and get them up to speed on doing this work. If their work is not as urgent, then you might prefer to give them sufficient time to learn not just the specific task, but also any peripheral knowledge. This will allow them flexibility in their learning, to focus on best practices instead of just rushing to ‘get the job done’.

Examples of small projects are:

  • Perform an analysis to obtain insight on a section of company data
  • Build a simple dashboard using data from the company’s database
  • Write a short piece of code that fits into software that your company owns, for example, adding a new feature. Or modifying a small section of the code to make it more efficient, or to reframe it for a different purpose
  • Following documentation to execute a piece of software your company owns

Integrate your data scientist into the business

Introducing your data scientist to key subject matter experts across the business is essential – these will be the people they may go back to again and again to obtain domain information essential to their analysis. You can do this through formal or informal channels. Examples of formal channels would be including them in stakeholder meetings, and any discussions involving core business strategy, day-to-day running of the business, factors that impact on profit and loss and the types of decision making involved. This will allow them to gain a context of their work and how it fits in with the company’s overall strategy. Informal discussions are sometimes the most efficient form of knowledge transfer. You could organise a chat over lunch with the relevant stakeholders to facilitate this.

While understanding how the business operates is helpful to the new data scientist, be mindful that they need to spend their time on other areas as well, and try not to overwhelm them with too much business information at once.

Communicate, communicate, communicate

At the start, it is important to communicate the expectations of the role, the type of problems you want them to solve, and the available resources in the company. This will help them to determine how best to get up to speed, to set learning objectives for themselves, and gather the resources to work towards your goals. If you assign them a task and there are deadlines to meet, make sure this is communicated clearly too. On the other hand, if you would like them to be free to spend their initial weeks on general upskilling, ensure they know this too. Make sure you are both aligned on a project plan to avoid rework down the track.

It’s entirely possible that the role will evolve, or you might change your mind on what you want them to work on. That’s okay too, as long as you keep them in the loop, and include them in these discussions. Data science is an interdisciplinary field and your data scientist should be adaptable.

Set up an environment for data science

Do you have the right environment set up for your data scientist? It is important to discuss from the very beginning what kind of tools and software they will need, and what resources you currently have. This will help them to figure out what’s achievable and what’s not. Whether or not the resources you currently have are sufficient depends on the end goals you have in mind for the data science project.

One-off analysis and proof-of-concept models will most likely not require any complex set up. However, imagine for example, that the end goal is to build a predictive model that automatically updates itself, then have them integrated into the business and made available to key persons on a dashboard. In this scenario, you may want to consider the technology you require in order to achieve this, and whether you might like to purchase cloud computing services or dashboard software. Also, if there is going to be more than one person working on the same set of code, then it is typically necessary to have version control software. If your company doesn’t already own a database, you may want to consider developing this alongside the data science project, especially if you envision having to make use of much more complex data in the future.

Start these discussions early, and plan as much as you can on choosing these initial systems, as it will be much harder to switch once you have set things up a certain way. Your vision and constraints will help your data scientist plan their workflow accordingly.

Conclusion

To summarise, this article outlines some approaches on how to design an onboarding process for a data scientist. Always communicate with your data scientist, as they may have their own thoughts on technical training, any resources they require from you, and how best to work towards a data science goal. In turn, as a manager with a lot of experience in your industry, you can help to provide context and domain information to your data scientist, and connect them to key stakeholders in your company. Providing your data scientist with the right environment and resources will ensure they are set up for success.

What is the Difference Between a Data Scientist and a Data Engineer?

The world of data can often feel a bit like a black box to outsiders or newcomers. A big focus of ours at Data Mettle is making leveraging data for your organisation accessible to as many people as possible. One of the first obstacles you might run across is basic terminology, roles and expertise.

One of the more common sources of confusion is what the difference is between the titles and roles of data scientists versus data engineers. While in the same bucket, these are very different roles, and knowing the difference is critical when beginning to make your first hires for a data team, or when you kick off a data project.

The most basic way to address the difference in these two roles is this: data engineers ensure the data is readily available for the data scientists to use to create answers to organisational questions. 

Data Engineers

The critical role of the data engineer is to ensure that data is readily available for the data scientists, and other analysts in the organisation. They create systems and databases to collect raw data from multiple sources and ensuring it is usable.

Using marketing as an example, an organisation might be collecting data about customers in a CRM, from the analytics software, customer surveys, and several other sources. A data engineer might marry this data from across these various products into a single source for the data science team to make use of.

Data Scientists

Data scientists, on the other hand, are the ones that analyse and provide answers to organisation questions using the data cleaned and made available to them by the data engineering team.

Using our previous marketing example, a data science team might take the data from the CRM, analytics tools, and surveys and begin to predict what characteristics make someone more likely to convert into a customer or increase lifetime value.

Skills Required for Each

Data engineers generally need experience building applications. They’ll likely have substantial experience in SQL and databases and might use programming languages such as Python, Ruby, C# and others. Generally, their background is software engineering. They may have some knowledge on the statistical side of things, but this is a ‘nice to have’ for this role.

Data scientists will likely have experience with databases and programming languages such as Python; the real differentiation will be their proficiency in statistics, maths, machine learning, deep learning and artificial intelligence. They will also need to stay on the cutting edge of research in these fields.

A good data scientist will also have a deep understanding of the business and organisational problems to be solved by their work. This understanding enables them to translate what the data is telling them into a product, tool or model to solve a business need and create an innovative solution.

What we do at Data Mettle

Our focus here at Data Mettle is on data science, reflected in our world-class team with backgrounds in artificial intelligence, machine learning and advanced mathematics. We usually embark on projects that have a strong emphasis on data science skills.

However, we, of course, understand that most organisations, particularly SMEs or startups, will require expertise in both. As such, almost always we end up doing a bit of both: helping organisations get their data in order so we can build them cutting edge solutions and business-critical tools.

How I Became a Data Scientist: With Jeremy

There are lots of people out there wondering how to transition from whatever field they’re in now into the exciting world of Data Science, so I thought that I’d throw my hat into the ring and describe how I went about becoming a data scientist. Checkout this video to see more about Jeremy’s life as a Data Scientist.

Jeremy Mitchell, Data Scientist, Data Mettle, Becoming a Data Scientist
Jeremy Mitchell Data Scientist

My Life as a Space Physicist

I started my career as a space physicist. My research was all about trying to figure out how astrophysical shock waves work. Basically, shock waves happen when you’ve got objects traveling faster than the speed of sound (or some other wave). So just like the sonic boom in front of a jet, or the bow wake in front of a boat. The shock wave’s job is to slow the fluid down. On Earth, that’s easy: there are millions of collisions among the atoms and molecules in the air/water/whatever that can slow the fluid down. In space, there are (almost) no collisions, so where do the shock waves come from?

I don’t want to go into answering that too much here. Instead, I’ll talk a little bit about how I studied it. There’s a big shock wave in the solar wind between the Earth and the Sun (because the solar wind is moving so fast, and the Earth is blocking its way). I used a bunch of different spacecraft, each of which crossed over this shock wave from time to time. This meant that sometimes I could see what was happening on different parts of the shock wave at the same time, and see if there were any large scale effects.

The relevant part here is that I needed to get large data sets from the spacecraft, prepare the data, compare the different datasets, and then use them to build physical models. That’s pretty similar to what I do now! The important part here is all the work needed to carefully collect, understand, and calibrate the data. Once I’d done that, I could use the data to build physical and mathematical models. The final step is validating those models, often meaning the process starts again!

Becoming a Data Scientist

How did this help me become a data scientist? Easy. The process is almost exactly the same: I would gather, clean and understand the data, use the data to build models, and then validate the models. Of course, I was now building statistical or machine learning models for marketing or operations optimisation for a large supermarket, but the process was remarkably similar. And just as much fun!

So what skills helped me make the switch? I’d say these (in no particular order):

  1. Lots of programming experience.
  2. Mathematical and statistical modeling knowledge.
  3. Knowing how to handle that much data.

Of course, there are lots of things that are very different too, so I’d add a fourth point

  1. Being open to learning the ropes in a very new environment.

Although this can be a challenge, personally I found it one of the best parts of becoming a data scientist, and it was nice to learn that there are so many interesting problems out there to get our teeth stuck into!

_data journey

Your data can tell you a lot about your customer's journey. Our services can provide you with the information and tools that you need to match your services to customers.