Examples
This page shows some examples of how to use tsp
for data exploration.
Loading data
The easiest way to load data into a TSP
is with one of the read_
functions (described in Readers).
from tsp.readers import read_gtnp, read_geotop, read_csv, read_ntgs
import pkg_resources
## Read common CSV exports
data = read_geotop(pkg_resources.resource_filename('tsp', 'data/example_geotop.csv'))
Manipulating data
A TSP
has several features that can make it easy to work with the data
Accessing data
Use the .long
and .wide
attributes to get the data in the “long” (tidy) or “wide” format.
data.long[1:9]
time |
depth |
temperature_in_ground |
|
---|---|---|---|
1 |
2021-05-06 16:04:33.770647 |
1 |
-0.5 |
2 |
2020-05-06 16:04:33.770647 |
1 |
-0.4 |
3 |
2022-05-06 16:04:33.770647 |
5 |
-3 |
4 |
2021-05-06 16:04:33.770647 |
5 |
-2 |
5 |
2020-05-06 16:04:33.770647 |
5 |
-1.8 |
6 |
2022-05-06 16:04:33.770647 |
10 |
-5 |
7 |
2021-05-06 16:04:33.770647 |
10 |
-4.9 |
8 |
2020-05-06 16:04:33.770647 |
10 |
-4.8 |
data.wide[1:3]
time |
1 |
5 |
10 |
|
---|---|---|---|---|
2021-05-06 16:04:33.770647 |
2021-05-06 16:04:33.770647 |
-0.5 |
-2 |
-4.9 |
2020-05-06 16:04:33.770647 |
2020-05-06 16:04:33.770647 |
-0.4 |
-1.8 |
-4.8 |
Alternatively, you can access the original data (.depths
, .times
, and .values
) as arrays:
>>> data.depths
array([ 0.1, 1. , 5. , 10. ])
>>> data.times
array(['2016-05-31T06:00:00.000000000', '2016-05-31T12:00:00.000000000',
'2016-05-31T18:00:00.000000000', ...,
'2019-12-31T12:00:00.000000000', '2019-12-31T18:00:00.000000000',
'2019-12-31T23:00:00.000000000'], dtype='datetime64[ns]')
>>> data.values
array([[-16.473, -9.27 , -0.123, 1.289],
[-13.206, -9.289, -0.123, 1.289],
[ -6.795, -9.308, -0.123, 1.289],
...,
[-18.027, -9.295, -0.11 , 1.142],
[-17.095, -9.315, -0.11 , 1.142],
[-16.501, -9.33 , -0.11 , 1.142]])
Averages
You can produce monthly or daily averages of data using the .daily()
and .monthly()
methods. Keep in mind that these return TSP
objects, not DataFrames! You need to use either .wide
or .long
to access the data.
>>> data.monthly().wide
time |
0.1 |
1.0 |
5.0 |
10.0 |
|
---|---|---|---|---|---|
2016-06-01 |
2016-06-01 |
3.711758 |
-6.167192 |
-0.118442 |
1.287075 |
2016-07-01 |
2016-07-01 |
8.930806 |
-0.795556 |
-0.131282 |
1.282887 |
2016-08-01 |
2016-08-01 |
7.942976 |
1.774726 |
-0.154524 |
1.278685 |
Plotting data
Each TSP
object can be visualized with a single command for easy data exploration. The most common data visualizations are provided. Each of the plotting methods begins with plot_
and they use the functions defined in Plotting functions, which can also be accessed directly.
# v Only needed for Jupyter notebook v
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
# ^ Only needed for Jupyter notebook ^
Plotting with the TSP class¶
The TSP class has several methods to easily generate plots
import pkg_resources as pr
import tsp
data = tsp.read_geotop(pr.resource_filename("tsp", "data/example_geotop.csv"))
data.plot_trumpet(year=2017, max_depth=10)
data.plot_contour(year=2017, contour=[0], max_depth=5, colours='dynamic')
data.plot_timeseries(title="What a great plot!", depths=[1, 5]);
Because the averaging methods provide TSP
objects, this makes it easy to combine with plotting capabilities. For instance, if you wanted a trumpet curve of monthly averages:
data.monthly().plot_trumpet(year=2017, title="Trumpet plot of monthly means")
from tsp.plots import trumpet_curve
import numpy as np
# Make up some data
dat =np.array([[0, 35, -30, -5],
[-1, 20 , -25, -3],
[-3, 10, -15,-2],
[-6, -1, -5, -1.5],
[-10, -1.2, -5, -1.1]])
# Make plot with plotting function
fig = trumpet_curve(depth=dat[:,0], t_max=dat[:,1],
t_min=dat[:,2], t_mean=dat[:,3],
title="Example of trumpet plot", max_depth=-5)
fig.show()
Time series plot¶
The time series plot features a clickable legend that toggles the legend item on or off.
import pkg_resources as pr
from tsp import read_geotop
from tsp.plots import time_series
# Get some data
data = read_geotop(pr.resource_filename("tsp", "data/example_geotop.csv"))
t = data.times
d = data.depths
T = data.values # z._values
# Make plot with plotting function
fig = time_series(depths=d, times=t, values=T,
title="Example of time series plot")
fig.show()
Colour Contour plot¶
import pkg_resources as pr
from tsp import read_geotop
from tsp.plots import colour_contour
# Get some data
data = read_geotop(pr.resource_filename("tsp", "data/example_geotop.csv"))
t = data.times
d = data.depths
T = data.values # z._values
# Make plot with plotting function
fig = colour_contour(depths=d, times=t, values=T,
title="Example of colour contour plot",
max_depth=7,
colours="symmetric", contour=[-3, 0], label_contour=True)
fig.show()
Handling Time Zones (UTC offset)
Handling time zones¶
TSP provides simple support for time zones.
When TSP objects are created, if the times
input is not timezone-aware, or if the synthetic
data is created, the resulting timestamps will not have any timezone information.
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
from tsp import TSP
t_naive = TSP.synthetic(depths=[0.5, 1, 3])
t_naive.times[0:5]
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04', '2000-01-05'], dtype='datetime64[ns]', name='time', freq=None)
Time zones can be specified as UTC offsets. Once set, the UTC offset will be visible whenever the timestamps are displayed.
from copy import copy
t_specified = copy(t_naive)
t_specified.set_utc_offset("-07:00")
print(t_specified.times[0:5])
print(t_specified.long)
DatetimeIndex(['2000-01-01 00:00:00-07:00', '2000-01-02 00:00:00-07:00', '2000-01-03 00:00:00-07:00', '2000-01-04 00:00:00-07:00', '2000-01-05 00:00:00-07:00'], dtype='datetime64[ns, UTC-07:00]', name='time', freq=None) time depth temperature_in_ground count 0 2000-01-01 00:00:00-07:00 0.5 4.786277 1 1 2000-01-02 00:00:00-07:00 0.5 4.796961 1 2 2000-01-03 00:00:00-07:00 0.5 4.806088 1 3 2000-01-04 00:00:00-07:00 0.5 4.813655 1 4 2000-01-05 00:00:00-07:00 0.5 4.819659 1 ... ... ... ... ... 3286 2002-12-28 00:00:00-07:00 3.0 1.681319 1 3287 2002-12-29 00:00:00-07:00 3.0 1.716074 1 3288 2002-12-30 00:00:00-07:00 3.0 1.750243 1 3289 2002-12-31 00:00:00-07:00 3.0 1.783817 1 3290 2003-01-01 00:00:00-07:00 3.0 1.816785 1 [3291 rows x 4 columns]
If timezone-aware dates are passed to the TSP
constructor, the timezone information will be preserved.
import datetime
import pandas as pd
import numpy as np
times = pd.to_datetime(["2014-01-04T00:00:00Z",
"2014-01-04T06:00:00Z",
"2014-01-04T12:00:00Z",
"2014-01-04T18:00:00Z",
"2014-01-05T00:00:00Z"])
t_aware = TSP(times, depths=[0.5, 1, 3], values=np.random.rand(5,3))
t_aware.times[0:5]
DatetimeIndex(['2014-01-04 00:00:00+00:00', '2014-01-04 06:00:00+00:00', '2014-01-04 12:00:00+00:00', '2014-01-04 18:00:00+00:00', '2014-01-05 00:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
If you want times to display in a different time zone, you can use the set_output_utc_offset()
method. This can be helpful for interpreting data. For instance, if a datalogger is set to UTC time but you want to see the data in the time zone it was collected in.
t_aware.set_output_utc_offset("+02:00")
t_aware.wide.iloc[0:5,:]
time | 0.5 | 1.0 | 3.0 | |
---|---|---|---|---|
2014-01-04 02:00:00+02:00 | 2014-01-04 02:00:00+02:00 | 0.338733 | 0.475442 | 0.163989 |
2014-01-04 08:00:00+02:00 | 2014-01-04 08:00:00+02:00 | 0.212410 | 0.389677 | 0.402952 |
2014-01-04 14:00:00+02:00 | 2014-01-04 14:00:00+02:00 | 0.747591 | 0.321167 | 0.623523 |
2014-01-04 20:00:00+02:00 | 2014-01-04 20:00:00+02:00 | 0.203651 | 0.275205 | 0.320933 |
2014-01-05 02:00:00+02:00 | 2014-01-05 02:00:00+02:00 | 0.901037 | 0.781435 | 0.428011 |
If UTC offset information is available in the data file that is read by teaspoon
, it is automatically set when the TSP
object is created.
%%capture
from tsp import read_hoboware
from pkg_resources import resource_filename
logger_file = resource_filename('tsp', "dataloggers/test_files/hobo_1_AB_classic.csv");
timezone_tsp = read_hoboware(logger_file)
timezone_tsp.times[0:5]
DatetimeIndex(['2010-08-18 14:00:00-07:00', '2010-08-18 15:00:00-07:00', '2010-08-18 16:00:00-07:00', '2010-08-18 17:00:00-07:00', '2010-08-18 18:00:00-07:00'], dtype='datetime64[ns, pytz.FixedOffset(-420)]', name='time', freq=None)
Plots generated from TSP
objects will automatically adjust the time axis ticks and label to match the UTC offset set by set_output_utc_offset
.
print(f"output time zone is {timezone_tsp.output_utc_offset}")
figure = timezone_tsp.plot_timeseries()
output time zone is pytz.FixedOffset(-420)
timezone_tsp.set_output_utc_offset("+00:00")
print(f"output time zone is {timezone_tsp.output_utc_offset}")
figure = timezone_tsp.plot_timeseries()
output time zone is UTC
If you want to find out whether the utc_offset
or output_utc_offset
is set, you can use the utc_offset
and output_utc_offset()
methods.
print(f"Time zone of t_aware data is '{t_aware.utc_offset}'")
print(f"output time zone of t_aware is '{t_aware.output_utc_offset}'")
print("")
print(f"Time zone of timezone_tsp data is '{t_naive.utc_offset}'")
print(f"output time zone of t_naive is '{t_naive.output_utc_offset}'")
Time zone of t_aware data is 'UTC' output time zone of t_aware is 'UTC+02:00' Time zone of timezone_tsp data is 'None' output time zone of t_naive is 'None'