Tuesday, November 28, 2023
HomeAIMy Life Stats: I Tracked My Habits for a Yr, and This...

# My Life Stats: I Tracked My Habits for a Yr, and This Is What I Realized | by Pau Blasco i Roca | Nov, 2023

I first seemed on the particular person time sequence for 4 variables: Sleep, Learning, Socializing and Temper. I used Microsoft Excel to rapidly draw some plots. They symbolize the every day variety of hours spent (blue) and the transferring average¹ for 5 days MA(5) (crimson) which I thought of to be a superb measure for my scenario. The temper variable was rated from 10 (the best!) to 0 (terrible!).

Relating to the info contained within the footnote of every plot: the complete is the sum of the values of the sequence, the imply is the arithmetic imply of the sequence, the STD is the usual deviation and the relative deviation is the STD divided by the imply. Whole: 2361h. Imply: 7,1h. STD: 1,1h. Relative deviation: 15.5% (picture by creator).

All issues accounted for, I did effectively sufficient with sleep. I had tough days, like everybody else, however I believe the development is fairly steady. The truth is, it is likely one of the least-varying of my examine. Whole: 589,1h. Imply: 1,8h. STD: 2,2. Relative deviation: 122% (picture by creator).

These are the hours I devoted to my educational profession. It fluctuates rather a lot — discovering steadiness between work and learning usually means having to cram tasks on the weekends — however nonetheless, I contemplate myself glad with it. Whole: 1440,9h. Imply: 4,3h. STD: 4,7h. Relative deviation: 107% (picture by creator).

Relating to this desk, all I can say is that I’m shocked. The grand complete is bigger than I anticipated, provided that I’m an introvert. In fact, hours with my colleagues at school additionally rely. By way of variability, the STD is actually excessive, which is smart given the problem of getting a stablished routine relating to socializing.

This the least variable sequence — the relative deviation is the bottom amongst my studied variables. A priori, I’m glad with the noticed development. I believe it’s constructive to maintain a reasonably steady temper — and even higher if it’s a superb one.

After trying on the developments for the principle variables, I made a decision to dive deeper and examine the potential correlations² between them. Since my aim was with the ability to mathematically mannequin and predict (or at the least clarify) “Temper”, correlations had been an vital metric to think about. From them, I may extract relationships like the next: “the times that I examine essentially the most are those that I sleep the least”, “I often examine languages and music collectively”, and so forth.

Earlier than we do anything, let’s open up a python file and import some key libraries from sequence evaluation. I usually use aliases for them, as it’s a widespread observe and makes issues much less verbose within the precise code.

`import pandas as pd               #1.4.4import numpy as np                #1.22.4import seaborn as sns             #0.12.0import matplotlib.pyplot as plt   #3.5.2from pmdarima import arima        #2.0.4`

We’ll make two completely different research relating to correlation. We’ll look into the Particular person Correlation Coefficient³ (for linear relationships between variables) and the Spearman Correlation Coefficient⁴ (which research monotonic relationships between variables). We will probably be utilizing their implementation⁵ in pandas.

## Pearson Correlation matrix

The Pearson Correlation Coefficient between two variables X and Y is computed as follows: the place cov is the covariance, sigma X is std(X) and sigma Y is std(Y)

We will rapidly calculate a correlation matrix, the place each potential pairwise correlation is computed.

`#learn, choose and normalize the infouncooked = pd.read_csv("final_stats.csv", sep=";")numerics = uncooked.select_dtypes('quantity')#compute the correlation matrixcorr = numerics.corr(technique='pearson')#generate the heatmapsns.heatmap(corr, annot=True)#draw the plotplt.present()`

That is the uncooked Pearson Correlation matrix obtained from my information.

And these are the numerous values⁶ — those which are, with a 95% confidence, completely different from zero. We carry out a t-test⁷ with the next components. For every correlation worth rho, we discard it if:

the place n is the pattern measurement. We will recycle the code from earlier than and add on this filter.

`#constantsN=332 #variety of samplesSTEST = 2/np.sqrt(N)def significance_pearson(val):if np.abs(val)<STEST:return Truereturn False#learn informationuncooked = pd.read_csv("final_stats.csv", sep=";")numerics = uncooked.select_dtypes('quantity')#calculate correlationcorr = numerics.corr(technique='pearson')#put together masksmasks = corr.copy().applymap(significance_pearson)mask2 = np.triu(np.ones_like(corr, dtype=bool)) #take away higher trianglemask_comb = np.logical_or(masks, mask2)c = sns.heatmap(corr, annot=True, masks=mask_comb)c.set_xticklabels(c.get_xticklabels(), rotation=-45)plt.present()`

These which were discarded may simply be noise, and wrongfully symbolize developments or relationships. In any case, it’s higher to imagine a real relationship is meaningless than contemplate significant one which isn’t (what we discuss with as error sort II being favored over error sort I). That is very true in a examine with moderately subjective measurments. Filtered Pearson Correlation matrix. Non-significant values (and the higher triangular) have been filtered out. (picture by creator)

## Spearman’s rank correlation coefficient

The spearman correlation coefficient will be calculated as follows: the place R signifies the rank variable⁸ — the remainder of variables are the identical ones as described within the Pearson coef.

As we did earlier than, we are able to rapidly compute the correlation matrix:

`#learn, choose and normalize the infouncooked = pd.read_csv("final_stats.csv", sep=";")numerics = uncooked.select_dtypes('quantity')#compute the correlation matrixcorr = numerics.corr(technique='spearman') #take note of this variation!#generate the heatmapsns.heatmap(corr, annot=True)#draw the plotplt.present()`

That is the uncooked Spearman’s Rank Correlation matrix obtained from my information:

Let’s see what values are literally vital. The components to test for significance is the next: the place r is spearman’s coefficient. Right here, t follows a t-student distribution with n-2 levels of freedom.

Right here, we’ll filter out all t-values larger (in absolute worth) than 1.96. Once more, the rationale they’ve been discarded is that we’re not positive whether or not they’re noise — random likelihood — or an precise development. Let’s code it up:

`#constantsN=332 #variety of samplesTTEST = 1.96def significance_spearman(val):if val==1:return Truet = val * np.sqrt((N-2)/(1-val*val))    if np.abs(t)<1.96:return Truereturn False#learn informationuncooked = pd.read_csv("final_stats.csv", sep=";")numerics = uncooked.select_dtypes('quantity')#calculate correlationcorr = numerics.corr(technique='spearman')#put together masksmasks = corr.copy().applymap(significance_spearman)mask2 = np.triu(np.ones_like(corr, dtype=bool)) #take away higher trianglemask_comb = np.logical_or(masks, mask2)#plot the outcomesc = sns.heatmap(corr, annot=True, masks=mask_comb)c.set_xticklabels(c.get_xticklabels(), rotation=-45)plt.present()`

These are the numerous values.

I consider this chart higher explains the obvious relationships between variables, as its criterion is extra “pure” (it considers monotonic⁹, and never solely linear, capabilities and relationships). It’s not as impacted by outliers as the opposite one (a few very dangerous days associated to a sure variable gained’t impression the general correlation coefficient).

Nonetheless, I’ll go away each charts for the reader to evaluate and extract their very own conclusions.

RELATED ARTICLES