
Project details:
This project builds a time-series model that forecasts hourly taxi orders for a service operating in a major city. After resampling data, creating lag features and adding rolling statistics, several models were evaluated. The final solution delivers accurate short-term forecasts and helps the company plan driver allocation during high-demand periods.
Description
Business Context & Problem
Taxi services must balance supply and demand. Too few drivers lead to long wait times and cancelled rides; too many increase idle time and operational costs. Forecasting upcoming demand helps the company schedule drivers effectively. This project focuses on predicting hourly order volume using historical data.
Data & Analytical Approach
The dataset consisted of timestamped order counts. After converting it into a clean and regular hourly time series, seasonal patterns and weekly trends were explored. Feature engineering introduced lag features (previous hours’ demand), rolling averages and other time-dependent variables that help capture temporal structure. The data was then split chronologically to ensure realistic model evaluation.
Statistical / ML Analysis
Several regression models were tested, including linear models and gradient boosting. Since time-series prediction is sensitive to overfitting, cross-validation was performed using time-aware splits. LightGBM showed the strongest performance, handling non-linear patterns and interacting features well. Model quality was assessed with RMSE, matching the project’s required accuracy threshold.
At the end of the research, several models were evaluated. The results are summarized in the table:
| Model name | RMSE Test | RMSE Train | Depth | EST |
|---|---|---|---|---|
| LinearRegression | 53.51571051157058 | 30.22368381335574 | —- | —- |
| DecisionTreeRegressor | 48.08733972353548 | 13.849317412647133 | 11 | —- |
| RandomForestRegressor | 43.80042268131265 | 8.658164216422168 | 19 | 95 |
Key Insights & Final Recommendations
The analysis revealed strong weekly seasonality and noticeable peaks during specific hours. Incorporating lag features and rolling statistics greatly improved forecast accuracy.
The final model enables the company to anticipate upcoming spikes in demand and allocate the right number of drivers at the right time, improving service quality and reducing unnecessary costs.
