Be Grateful

Kinetic Data Science
Kinetic Data Science
  • Home
  • ePortfolio
  • Computer Vision
  • Quantum
    • Quantum Software
    • Quantum Hardware
  • Data Visualization
    • US Population Report
    • Cancer Data
    • US Tornado Data
  • BI Projects
    • Employee Attrition
  • Public Datasets
  • Data Science
    • Business Intelligence
    • Data Engineering
    • Data Warehousing
    • Data Visualization
    • Machine Learning and AI
  • Demo
  • Bacquets
  • More
    • Home
    • ePortfolio
    • Computer Vision
    • Quantum
      • Quantum Software
      • Quantum Hardware
    • Data Visualization
      • US Population Report
      • Cancer Data
      • US Tornado Data
    • BI Projects
      • Employee Attrition
    • Public Datasets
    • Data Science
      • Business Intelligence
      • Data Engineering
      • Data Warehousing
      • Data Visualization
      • Machine Learning and AI
    • Demo
    • Bacquets
  • Sign In

  • My Account
  • Signed in as:

  • filler@godaddy.com


  • My Account
  • Sign out

Signed in as:

filler@godaddy.com

  • Home
  • ePortfolio
  • Computer Vision
  • Quantum
    • Quantum Software
    • Quantum Hardware
  • Data Visualization
    • US Population Report
    • Cancer Data
    • US Tornado Data
  • BI Projects
    • Employee Attrition
  • Public Datasets
  • Data Science
    • Business Intelligence
    • Data Engineering
    • Data Warehousing
    • Data Visualization
    • Machine Learning and AI
  • Demo
  • Bacquets

Account


  • My Account
  • Sign out


  • Sign In
  • My Account

Data Science Techniques for Exploring Data

Employee Attrition

 employee Data Set 

Employee churn rate: The term attrition refers to a gradual but deliberate reduction in staff.

A local Hospital that has thousands of employees spread out across the state. The company believes in hiring the best talent available and retaining them for as long as possible. A huge amount of resources is spent on retaining existing employees through various initiatives. The Head of People Operations wants to bring down the cost of retaining employees. For this, he proposes limiting the incentives to only those employees who are at risk of attrition. As a recently hired Data Scientist in the People Operations Department, you have been asked to identify patterns in characteristics of employees who leave the organization. Also, you have to use this information to predict if an employee is at risk of attrition. This information will be used to target them with incentives.


Objective : 


* To identify the different factors that drive attrition

* To make a model to predict if an employee will attrite or not



### Dataset :

The data contains demographic details, work-related metrics and attrition flag.


* **EmployeeNumber** - Employee Identifier

* **Attrition** - Did the employee attrite?

* **Age** - Age of the employee

* **BusinessTravel** - Travel commitments for the job

* **DailyRate** - Data description not available**

* **Department** - Employee Department

* **DistanceFromHome** - Distance from work to home (in km)

* **Education** - 1-Below College, 2-College, 3-Bachelor, 4-Master,5-Doctor

* **EducationField** - Field of Education

* **EnvironmentSatisfaction** - 1-Low, 2-Medium, 3-High, 4-Very High

* **Gender** - Employee's gender

* **HourlyRate** - Data description not available**

* **JobInvolvement** - 1-Low, 2-Medium, 3-High, 4-Very High

* **JobLevel** - Level of job (1 to 5)

* **JobRole** - Job Roles

* **JobSatisfaction** - 1-Low, 2-Medium, 3-High, 4-Very High

* **MaritalStatus** - Marital Status

* **MonthlyIncome** - Monthly Salary

* **MonthlyRate** - Data description not available**

* **NumCompaniesWorked** - Number of companies worked at

* **Over18** - Over 18 years of age?

* **OverTime** - Overtime?

* **PercentSalaryHike** - The percentage increase in salary last year

* **PerformanceRating** - 1-Low, 2-Good, 3-Excellent, 4-Outstanding

* **RelationshipSatisfaction** - 1-Low, 2-Medium, 3-High, 4-Very High

* **StandardHours** - Standard Hours

* **StockOptionLevel** - Stock Option Level

* **TotalWorkingYears** - Total years worked

* **TrainingTimesLastYear** - Number of training attended last year

* **WorkLifeBalance** - 1-Low, 2-Good, 3-Excellent, 4-Outstanding

* **YearsAtCompany** - Years at Company

* **YearsInCurrentRole** - Years in the current role

* **YearsSinceLastPromotion** - Years since the last promotion

* **YearsWithCurrManager** - Years with the current manager


** In the real world, you will not find definitions for some of your variables. It is a part of the analysis to figure out what they might mean. 

Download the Data Set and Python Jupyter Notebook

HR_Employee_Attrition_Dataset (csv)Download
Employee_Attrition_.ipynb (txt)Download

GitHuB

View and Download the full model on GitHub

GitHuB

Below are some examples of the code and output found in the Jupyter Notebook file from above.  The notebook contains more code blocks than are listed below.  Lots of fun stuff to play with.

Import Libraries and .csv file

Univariate Analysis of Numerical Columns

Observations

 

  • Average employee age is around 37 years. It has a high range, from 18 years to 60, indicating good age diversity in the organization.
  • At least 50% of the employees live within a 7 km radius from the organization. However there are some extreme values, seeing as the maximum value is 29 km.
  • The average monthly income of an employee is USD 6500. It has a high range of values from 1K-20K, which is to be expected for any organization's income distribution. There is a big difference between the 3rd quartile value (around USD 8400) and the maximum value (nearly USD 20000), showing that the company's highest earners have a disproportionately large income in comparison to the rest of the employees. Again, this is fairly common in most organizations.
  • Average salary hike of an employee is around 15%. At least 50% of employees got a salary hike 14% or less, with the maximum salary hike being 25%.
  • Average number of years an employee is associated with the company is 7.
  • On average, the number of years since an employee got a promotion is 2.18. The majority of employees have been promoted since the last year.

Histogram

Exploring Distributions Variables w/ Histograms

Observations

  • The age distribution is close to a normal distribution with the majority of employees between the ages of 25 and 50.
  • The percentage salary hike is skewed to the right, which means employees are mostly getting lower percentage salary increases.
  • MonthlyIncome and TotalWorkingYears are skewed to the right, indicating that the majority of workers are in entry / mid-level positions in the organization.
  • DistanceFromHome also has a right skewed distribution, meaning most employees live close to work but there are a few that live further away.
  • On average, an employee has worked at 2.5 companies. Most employees have worked at only 1 company.
  • The YearsAtCompany variable distribution shows a good proportion of workers with 10+ years, indicating a significant number of loyal employees at the organization.
  • The YearsInCurrentRole distribution has three peaks at 0, 2, and 7. There are a few employees that have even stayed in the same role for 15 years and more.
  • The YearsSinceLastPromotion variable distribution indicates that some employees have not received a promotion in 10-15 years and are still working in the organization. These employees are assumed to be high work-experience employees in upper-management roles, such as co-founders, C-suite employees and the like.
  • The distributions of DailyRate, HourlyRate and MonthlyRate appear to be uniform and do not provide much information. It could be that daily rate refers to the income earned per extra day worked while hourly rate could refer to the same concept applied for extra hours worked per day. Since these rates tend to be broadly similiar for multiple employees in the same department, that explains the uniform distribution they show.

Multivariate Analysis

Exploring Distributions Variables w/ Histograms

Observations

Employees working overtime have more than a 30% chance of attrition

Employees working as sales representatives have an attrition rate of around 40%

Employees working as sales representatives have an attrition rate of around 40%

  •  which is very high compared to the 10% chance of attrition for employees who do not work extra hours.
  • As seen earlier, the majority of employees work for the R&D department. The chance of attrition there is ~15%

Employees working as sales representatives have an attrition rate of around 40%

Employees working as sales representatives have an attrition rate of around 40%

Employees working as sales representatives have an attrition rate of around 40%

 while HRs and Technicians have an attrition rate of around 25%. The sales and HR departments have higher attrition rates in comparison to an academic department like Research & Development, an observation that makes intuitive sense keeping in mind the differences in those job profiles. The high-pressure and incentive-based nature of Sales and Marketing roles may be contributing to their higher attrition rates. 

The lower the employee's job involvement, the higher their attrition chances

Employees working as sales representatives have an attrition rate of around 40%

The lower the employee's job involvement, the higher their attrition chances

 The reason for this could be that employees with lower job involvement might feel left out or less valued and have already started to explore new options, leading to a higher attrition rate.  

The "Years At Company" variable distribution

"Distance From Home" also has a right skewed distribution

The lower the employee's job involvement, the higher their attrition chances

 

 Indicating a significant number of loyal employees at the organization. 

"Distance From Home" also has a right skewed distribution

"Distance From Home" also has a right skewed distribution

"Distance From Home" also has a right skewed distribution

 meaning most employees live close to work but there are a few that live further away. 

The "Years Since Last Promotion"

"Distance From Home" also has a right skewed distribution

"Distance From Home" also has a right skewed distribution

  These employees are assumed to be high work-experience employees in upper-management roles, such as co-founders, C-suite employees and the like. 

HR Attrition Correlation Matrix

Observations

  •  Total work experience, monthly income, years at company and years with current manager are highly correlated with each other and with employee age which is easy to understand as these variables show an increase with age for most employees.
  • Years at company and years in current role are correlated with years since last promotion which means that the company is not giving promotions at the right time.

Seaborn Output Bar Plot

Observations

The feature importance plot for the base model and tuned model are quite similar

  • the model seems to suggest that OverTime, MonthlyIncome, Age, TotalWorkingYears and DailyRate are the most important features.
  • Other important features are DistanceFromHome, StockOptionLevel, YearsAt Company and NumCompaniesWorked.

Decision Trees and Random Forest

Observations

 Note: Blue leaves represent the eligible class i.e. y[1] and the orange leaves represent the non-eligible class i.e. y[0]. Also, the more the number of observations in a leaf, the darker its color gets.



  • Employees who are doing overtime with low salary and low age have a chance of leaving the company, as they might feel overworked and underpaid and might be looking for better opportunities.
  • Employees who are doing overtime with low salary and are not research scientists have a high chance of attriting.
  • Employees, even if they have an income over 3751.5 units but working as sales executives and and living far from home have a high chance of attriting.
  • Another segment of people are who are doing overtime, with ages younger than 33.5 and not working as junior research scientists, have a greater chance of attrition. This implies that the model suggests except for the junior research scientist role, everyone who is young has a high tendency to attrite.
  • Employees who have over 2.5 years of work experience but low work life balance and low percentage hike also tend to attrite, probably as they are seeking a more balanced life.
  • Employees who are not doing overtime, have low experience and are working as junior research scientists have a small chance of attriting. These employees are comfortable or loyal to the organization.
  • Numcompaniesworked also seems to be an important variable in predicting if an employee's likely to attrite.

Copyright © 2021 Kinetic Data Science - All Rights Reserved.

  • Home
  • ePortfolio
  • Computer Vision
  • Quantum Software
  • Quantum Hardware
  • US Population Report
  • Cancer Data
  • US Tornado Data
  • Employee Attrition
  • Public Datasets
  • Data Visualization

Powered by ✟ Data Science

This website uses cookies.

We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.

DeclineAccept