0

In my table I have data from employee contracts like this: enter image description here

And I want to get job position in time periods, something like this

login ValidFrom ValidTo JobPosition
bcde 2019-07-01 2019-09-30 Project Manager
bcde 2020-01-09 2020-06-16 Head of Center of Excellence
bcde 2020-06-17 2021-07-31 Team Leader
bcde 2021-08-01 2099-12-31 Head of Center of Excellence

So I write query:

select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, login

But It doesn't work in this case (if someone has the same JobPosition), so I decided to use dense_rank, like this:

select login, ValidFrom, ValidTo, JobPosition
,dense_rank() OVER (Partition BY JobPosition ORDER BY login, ValidFrom, ValidTo, JobPosition) as no
from employeeContracts

and after that

select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, no, login

but problem is dense_rank doesn't work as I need ;) I want to get something like this:

login ValidFrom ValidTo JobPosition no
bcde 2019-07-01 2019-09-30 Project Manager 1
bcde 2020-01-09 2020-06-16 Head of Center of Excellence 2
bcde 2020-06-17 2020-07-31 Team Leader 3
bcde 2020-08-01 2021-03-31 Team Leader 3
bcde 2021-04-01 2021-06-30 Team Leader 3
bcde 2021-07-01 2021-07-31 Team Leader 3
bcde 2021-08-01 2021-12-31 Head of Center of Excellence 4
bcde 2022-01-01 2022-05-09 Head of Center of Excellence 4
bcde 2022-02-01 2022-05-09 Head of Center of Excellence 4
bcde 2022-05-09 2099-12-31 Head of Center of Excellence 4

and after that use query to get final result:

select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, no, login

Example Data Scheme

CREATE TABLE employeeContracts (
  login text,
  ValidFrom datetime,
  ValidTo datetime,
  JobPosition text
);

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2019-07-01', '2019-09-30', 'Project Manager');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2020-01-09', '2020-06-16', 'Head of Center of Excellence');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2020-06-17', '2020-07-31', 'Team Leader');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2020-08-01', '2021-03-31', '   Team Leader');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2021-04-01', '2021-06-30', '   Team Leader');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2021-07-01', '2021-07-31', '   Team Leader');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2021-08-01', '2021-12-31', '   Head of Center of Excellence');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2022-01-01', ' 2022-05-09', 'Head of Center of Excellence');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2022-02-01', '2022-05-09', 'Head of Center of Excellence');

INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition) 
VALUES ('bcde', '2022-05-09', '2099-12-31', 'Head of Center of Excellence');

This can be tested Here

3
  • I use Sql dedicated pool in Azure Synapse
    – l_obr
    Commented May 13, 2022 at 13:48
  • Thanks but the fiddle is missing values for DimEmployeeId. Need that value to understand what results you're currently getting.
    – SOS
    Commented May 13, 2022 at 15:03
  • 1
    Sorry my mistake ;) We can skip DimEmployeeId
    – l_obr
    Commented May 13, 2022 at 19:51

1 Answer 1

1

This sounds like a "Gaps and Islands" problem. One way of approaching it is to identify and assign rankings to each "island" (or date ranges for each JobPosition).

login ValidFrom ValidTo JobPosition PrevPosition JobPositionGroup
bcde 2019-07-01 00:00:00 2019-09-30 00:00:00 Project Manager null 1
bcde 2020-01-09 00:00:00 2020-06-16 00:00:00 Head of Center of Excellence Project Manager 2
bcde 2020-06-17 00:00:00 2020-07-31 00:00:00 Team Leader Head of Center of Excellence 3
bcde 2020-08-01 00:00:00 2021-03-31 00:00:00 Team Leader Team Leader 3
bcde 2021-04-01 00:00:00 2021-06-30 00:00:00 Team Leader Team Leader 3
bcde 2021-07-01 00:00:00 2021-07-31 00:00:00 Team Leader Team Leader 3
bcde 2021-08-01 00:00:00 2021-12-31 00:00:00 Head of Center of Excellence Team Leader 4
bcde 2022-01-01 00:00:00 2022-05-09 00:00:00 Head of Center of Excellence Head of Center of Excellence 4
bcde 2022-02-01 00:00:00 2022-05-09 00:00:00 Head of Center of Excellence Head of Center of Excellence 4
bcde 2022-05-09 00:00:00 2099-12-31 00:00:00 Head of Center of Excellence Head of Center of Excellence 4

Then apply your min/max logic, grouping by the ranking number:

 SELECT grp.JobPositionGroup
        , grp.JobPosition
        , grp.Login
        , MIN(grp.ValidFrom) AS ValidFrom
        , MAX(grp.ValidTo) AS ValidTo
 FROM (
         SELECT  cron.*
                 , SUM( IF( JobPosition = PrevPosition, 0, 1) ) OVER(
                            ORDER BY ValidFrom, ValidTo
                 )  AS JobPositionGroup
         FROM   (
                    SELECT ec.*
                          , LAG(JobPosition, 1) OVER (
                               ORDER BY ValidFrom, ValidTo
                          ) AS PrevPosition
                    FROM   employeeContracts ec
                 )
                 cron
         ) grp
 GROUP BY grp.JobPositionGroup
          , grp.JobPosition
          , grp.Login

Results:

JobPositionGroup JobPosition Login ValidFrom ValidTo
1 Project Manager bcde 2019-07-01 00:00:00 2019-09-30 00:00:00
2 Head of Center of Excellence bcde 2020-01-09 00:00:00 2020-06-16 00:00:00
3 Team Leader bcde 2020-06-17 00:00:00 2021-07-31 00:00:00
4 Head of Center of Excellence bcde 2021-08-01 00:00:00 2099-12-31 00:00:00

db<>fiddle here

2
  • thank you, it works! ;) but I had to change IF construction to CASE WHEN
    – l_obr
    Commented May 15, 2022 at 18:37
  • Gotcha. I went with MySQL syntax based on the fiddle, but CASE is probably better as it's more standard and works in a wider variety of db's.
    – SOS
    Commented May 16, 2022 at 2:13

Not the answer you're looking for? Browse other questions tagged or ask your own question.