As to exercise my data skills as a Data Analysis and Visualization MSc candidate, I was analyzing this dataset of Citibike usage for Dec. 2021🚲 today (visualization in the making!), which contains approx. 1.8 millions of ride history (rows!). While trying to find and eliminate duplicated entries, I came to wholeheartedly be amazed that there are actually two people in the whole wide world (of NYC) who rent a bike at the same time (by seconds), and use for exactly the same damn length of time (by seconds), then return it at the same time (again, by seconds) but just in a different location. WOW, just WOW. Occurred by 0.1% of a chance in this dataset - literally a 1,000 in a million, wow it happens 1,000 times more than "one in a million"??? That frequent???
So folks, be convinced when they say, "you are not alone, there is always someone who does the same thing as you do in the whole world," yet this maybe a lot closer than you think (like within a city)!
source: Index of bucket "tripdata"
*just so you know... data was perfectly clean and there were no duplicate entries in the original.
#gradlife#dataanalysis#rprogramming#rstudio#coding#datavisualization#citibike#nyc#newyorkdata#statistics#dataset#data
How I found ↓
library(tidyverse)
bike_nyc_21dec <- read_csv("[your path]/202112-baywheels-tripdata.csv")
bike_nyc_21dec_dup <- bike_nyc_21dec %>%
count(started_at, ended_at) %>%
filter(n > 1)
bike_nyc_21dec %>%
filter(started_at %in% bike_nyc_21dec_dup$started_at, ended_at %in% bike_nyc_21dec_dup$ended_at) %>%
arrange(started_at)
*replace [your path] with your file location
Hope you enjoy this incredible fact as much as I do!
Comments