TidierDates.jl
is a 100% Julia implementation of the R lubridate package.
TidierDates.jl
has one main goal: to implement lubridate's straightforward syntax and of ease of use while working with dates for Julia users. While this package was developed to work seamlessly with Tidier.jl
functions and macros, it can also work as a independently as a standalone package. This package is powered by Dates.jl.
For the development version:
using Pkg
Pkg.add(url = "https://github.com/TidierOrg/TidierDates.jl.git")
ymd()
,ymd_hms()
,ymd_h()
,ymd_hm()
dmy()
,dmy_hms()
,dmy_h()
,dmy_hm()
mdy()
,mdy_hms()
,mdy_h()
,mdy_hm()
floor_date()
round_date()
timediff()
now()
,today()
am()
,pm()
leap_year()
days_in_month()
These functions parse dates represented as strings into a DateTime format in Julia. The input should be a string month-day-year, day-month-year, or year-month-day format respectively. They are relatively robust in their ability to take non-uniform strings of dates.
using TidierData
using TidierDates
df = DataFrame(date = ["today is the 4th July, 2000",
"ayer fue 13th Oct 2001",
"3 of Mar, 2002 was a fun day",
"23rd Apr 2003",
"23/7/2043",
"03/02/1932",
"23-08-1932",
"4th of July, 2005",
"08092019" ,
missing])
@chain df begin
@mutate(date = dmy(date))
end
10×1 DataFrame
Row │ date
│ Date?
─────┼────────────
1 │ 2000-07-04
2 │ 2001-10-13
3 │ 2002-03-03
4 │ 2003-04-23
5 │ 2043-07-23
6 │ 1932-03-02
7 │ 1932-08-23
8 │ 2005-07-04
9 │ 2019-09-08
10 │ missing
Similar to the previous group, these functions parse date-time strings in month-day-year, day-month-year, or year-month-day format respectively. The input should include both date and time information.
floor_date()
: This function rounds a date down to the nearest specified unit (e.g., hour, minute, day, month, year). It takes two arguments - a Date or DateTime object and a string indicating the unit of time to which the date should be floored.
round_date()
: This function rounds a date to the nearest specified unit (e.g., hour, minute, month, year). Like
df2 = DataFrame(date = ["20190330120141", "2008-04-05 16-23-07", "2010.06.07 19:45:00",
"2011-2-8 14-3-7", "2012-3, 9 09:2, 37", "201305-15 0302-09",
"2013 arbitrary 2 non-decimal 7 chars 13 in between 2 !!! 7",
"OR collapsed formats: 20140618 181608 (as long as prefixed with zeros)",
missing ])
@chain df2 begin
@mutate(date = ymd_hms(date))
@mutate(floor_byhr = floor_date(date, "hour"))
@mutate(round_bymin = round_date(date, "minute"))
@mutate(rounded_bymo = round_date(date, "month"))
end
9×4 DataFrame
Row │ date floor_byhr round_bymin rounded_bymo
│ DateTime? DateTime? DateTime? Date?
─────┼─────────────────────────────────────────────────────────────────────────────
1 │ 2019-03-30T12:01:41 2019-03-30T12:00:00 2019-03-30T12:02:00 2019-04-01
2 │ 2008-04-05T16:23:07 2008-04-05T16:00:00 2008-04-05T16:23:00 2008-04-01
3 │ 2010-06-07T19:45:00 2010-06-07T19:00:00 2010-06-07T19:45:00 2010-06-01
4 │ 2011-02-08T14:03:07 2011-02-08T14:00:00 2011-02-08T14:03:00 2011-02-01
5 │ 2012-03-09T09:02:37 2012-03-09T09:00:00 2012-03-09T09:03:00 2012-03-01
6 │ 2013-05-15T03:02:09 2013-05-15T03:00:00 2013-05-15T03:02:00 2013-05-01
7 │ 2013-02-07T13:02:07 2013-02-07T13:00:00 2013-02-07T13:02:00 2013-02-01
8 │ 2014-06-18T18:16:08 2014-06-18T18:00:00 2014-06-18T18:16:00 2014-07-01
9 │ missing missing missing missing
This function computes the difference between two DateTime or Date objects. It returns the result in the unit specified by the second argument, which can be "seconds", "minutes", "hours", "days", or "weeks". It returns this value as a float.
times = DataFrame(
start_time = [
"06-27-2023 15:20:00",
"06-26-2023 12:45:15",
"06-26-2023 16:30:30",
"06-25-2023 10:11:35",
"06-24-2023 09:00:24",
"06-26-2023 09:30:00",
"06-25-2023 11:00:15",
"06-24-2023 01:34:45",
"06-26-2023 14:20:00",
"06-25-2023 10:45:30"
],
end_time = [
"06-27-2023 14:53:53",
"06-25-2023 10:50:30",
"06-28-2023 16:32:30",
"06-24-2023 10:20:30",
"06-24-2023 10:05:00",
missing,
"10-25-2023 11:55:13",
"06-24-2023 11:35:45",
"07-26-2023 15:15:45",
"06-24-2023 12:50:15"
]
)
10×2 DataFrame
Row │ start_time end_time
│ String String?
─────┼──────────────────────────────────────────
1 │ 06-27-2023 15:20:00 06-27-2023 14:53:53
2 │ 06-26-2023 12:45:15 06-25-2023 10:50:30
3 │ 06-26-2023 16:30:30 06-28-2023 16:32:30
4 │ 06-25-2023 10:11:35 06-24-2023 10:20:30
5 │ 06-24-2023 09:00:24 06-24-2023 10:05:00
6 │ 06-26-2023 09:30:00 missing
7 │ 06-25-2023 11:00:15 10-25-2023 11:55:13
8 │ 06-24-2023 01:34:45 06-24-2023 11:35:45
9 │ 06-26-2023 14:20:00 07-26-2023 15:15:45
10 │ 06-25-2023 10:45:30 06-24-2023 12:50:15
after a string is converted into a datetime format, Date.jl functions such as hour(), year(), etc can be applied in Tidier chains as well.
@chain times begin
@mutate(start_time = mdy_hms(start_time))
@mutate(end_time = mdy_hms(end_time))
@mutate(timedifmins = difftime(end_time, start_time, "minutes"))
@mutate(timedifmins = difftime(end_time, start_time, "hours"))
@mutate(year= year(start_time))
@mutate(second = second(start_time))
end
10×5 DataFrame
Row │ start_time end_time timedifmins year second
│ DateTime DateTime? Float64? Int64 Int64
─────┼─────────────────────────────────────────────────────────────────────────
1 │ 2023-06-27T15:20:00 2023-06-27T14:53:53 -0.435278 2023 0
2 │ 2023-06-26T12:45:15 2023-06-25T10:50:30 -25.9125 2023 15
3 │ 2023-06-26T16:30:30 2023-06-28T16:32:30 48.0333 2023 30
4 │ 2023-06-25T10:11:35 2023-06-24T10:20:30 -23.8514 2023 35
5 │ 2023-06-24T09:00:24 2023-06-24T10:05:00 1.07667 2023 24
6 │ 2023-06-26T09:30:00 missing missing 2023 0
7 │ 2023-06-25T11:00:15 2023-10-25T11:55:13 2928.92 2023 15
8 │ 2023-06-24T01:34:45 2023-06-24T11:35:45 10.0167 2023 45
9 │ 2023-06-26T14:20:00 2023-07-26T15:15:45 720.929 2023 0
10 │ 2023-06-25T10:45:30 2023-06-24T12:50:15 -21.9208 2023 30
This function parses time strings (e.g., "12:34:56") into a Time format in Julia. It takes a string or an array of strings with the time information and doesn't require additional arguments.
df3 = DataFrame(
Time = [
"09:00:24",
"10:11:35",
"01:34:45",
"12:45:15",
"09:30:00",
"10:45:30",
"11:00:15",
"14:20:00",
"15:10:45",
missing
]
)
@chain df3 begin
@mutate(Time = hms(Time))
end
10×1 DataFrame
Row │ Time
│ Time?
─────┼──────────
1 │ 09:00:24
2 │ 10:11:35
3 │ 01:34:45
4 │ 12:45:15
5 │ 09:30:00
6 │ 10:45:30
7 │ 11:00:15
8 │ 14:20:00
9 │ 15:10:45
10 │ missing