Nebu Blog

Collecting, analyzing and visualizing time-series data with Nebu and R

Whether you wish to predict the trend in upcoming elections, the impact of the weather on shopping decisions, or electricity consumption, time is an important factor that must be considered in collecting, processing and modeling data.

Collecting, analyzing and visualizing time-series data with Nebu and R

Csanad Kolcsei
Posted on 23 February 2021 in Big Data
by Csanad Kolcsei
6 min

In this blog post, I will dive deeper into the time-series. Time series data help to determine if a time-based trend exists and that knowledge can be used further, to predict future consumers' behaviors. I would like to share with you some of Nebu's good practices and tips on collecting and analyzing such data. I will also show you some ready-to-use visualizations that you can apply using Nebu.


What is time-series data?

The easiest way to define what time-series data is would be to say that it’s data collected over a certain period of time. The main reason to collect this kind of information is not only to observe the individual data points, but to have the ability to track progress and change over time.

For this reason, the time attribute is a crucial part of the time-series data. In the simplest case we’re looking at a univariate time-series, which means that a single variable is observed over time, so we’re dealing with (at least) two attributes: time and the observed value itself.

Nowadays, with our lives so deeply rooted in hi-tech and online services, you could say that data surrounds us. If we would have the chance to take a closer look into any of these datasets, there’s a pretty good chance that its records will contain some kind of timestamp. But in such cases, is it sufficient for the data to be considered a time-series? Not quite. Generally, in a time-series dataset, the records are arranged by their timestamps. Another crucial aspect is that new records cannot overwrite previous ones. Otherwise, we'd lose the ability to track changes.


What is time-series data used for?

Most of the data we have daily encounters with - in the news, in weather or stock market forecasts - can be considered time-series.

Without a doubt, the main (and often, only) goal of time-series data collection is to track change. People are interested in observing trends and tendencies spread over longer periods of time. It can be insightful to compare how for example behavior of customers of brand X have been fluctuating over the last couple of years. Such information can be utilized in multiple ways, like for brand X managers to be able to tell whether the designed and observed process has been heading in the right direction, or making predictions about the future by looking at trends.

Many time-series related techniques are becoming a part of our lives, for example in managing complex electronic systems like smart homes and self-driving cars. Not to mention that it’s been also of great help in recent months when tracking, analyzing, and making predictions about the COVID-19 pandemic.


Collecting time-series data

It won't be a surprise to anyone if I say that, just as in any data type collection process, it can be very valuable to  collect time-series data in order to perform analysis and draw sharp insights.

To create a proper time-series dataset that is easy to work with, there are a few factors to be considered prior to the start of a project. First of all, ideally, such data needs to be collected regularly, in such intervals that it’s guaranteed that between two data processing phases new data is added to the data set. This will make sure that we don’t have to deal with missing data.

It’s also worth mentioning that it’s usually more effective for a time-series analysis if the data collection is performed for a longer period of time. What do I mean by long? Well, truth be told, it is relative to the frequency of data collection and can vary depending on the project specifics, so I cannot give you an exact number here. However, generally speaking, longer time frames usually lead to more accurate results than shorter ones.

In case it’s essential to visualize the collected data as soon as possible, it should be considered to use smaller aggregation time frames in the beginning, then in the end result. For example presenting data for each month is not possible after two weeks of data collection, but by summarising the available data for each week, we can already start creating outputs.


Analysing time-series data

The R programming language offers excellent tools here. There are multiple libraries available  with which the analysis of time-series data becomes pretty straightforward. 

It’s worth mentioning that some of these techniques require the dataset to meet certain criteria. Some are stricter than others. Therefore, I’d suggest looking into such details and analyzing techniques before beginning the data collection itself. For example, to detect anomalies, also known as outlying values, we need the time-series to follow a regular, predictable pattern, in other words contain a seasonality component.

There are so many techniques available that only to list them all would be a real challenge. But don't worry. We are here to help. That's why anticipating the needs of Nebu customers for an easy-to-apply time-series analysis, we have created several ready-to-use examples of how to analyze time-series data using R. You can look into them on this time-series dashboard.

One of these techniques is the very well-known linear regression. It’s a relatively simple yet very effective indication of progress, that is achieved by drawing a trend line.

Another technique that you can easily get familiar with and apply to your projects with our support is STL Decomposition. To summarize, what it does is decomposes the time series into various components. One of these is the previously mentioned seasonality. If you’re interested in this technique I strongly advise you to check out this Nebu Reporter dashboard with our time-series visualizations.

I have used STL Decomposition to detect anomalies, and replace these outlying data points with corrected values, so that we get a more regular graph. The dashboard contains an interactive example.


Time-series visualizations

As we already know the time-series data must have a time attribute, and it is no surprise that this attribute has to be displayed as well in all the visualizations.

The most commonly used way is probably putting a time attribute on the horizontal axis, while the vertical axis represents the observed value(s). The result will be a two-dimensional line chart, which is very easy to interpret, and I’m sure that when I say “time-series” it’s the first picture that comes to everybody’s minds.

Using line charts is probably overall the best way to visualize time-series. But it is not the only one! We can do a little more to impress our boss and clients ;). We can represent the time attribute spread over time. What I mean by this is that we can create individual snapshots of the time-related-data of the observed process after each other, which results in an animation. This approach lets us take a closer look at each snapshot and also enables us to display other attributes on the chart, as time is no longer needed to be represented directly, so one of the axes gets "freed" to display other variables. 

Choosing animations over more traditional visualization formats of course takes a little bit away from strict usability driven approach, as it almost always results in a chart which is slightly harder to understand within a blink of an eye. But at the same time adds lots of marketing value, as it grabs everyones' attention, simply because we all just love to look and play with interactive and moving charts.


If you want to learn more about Nebu Data Suite and how it can help you automate processes and increase work efficiency submit a form to the right. Our expert will reach out soon to schedule a call! 


Like our blog? Subscribe now!

Submit a commment