
Project details:
In this project, an apartment-listings dataset was analysed to identify pricing outliers and anomalies that may point to fraudulent listings. Key attributes such as area, ceiling height, distance from city centre and listing date were explored using visualisations and engineered features. The findings deliver actionable insights for real-estate platforms seeking to flag irregular listings and understand pricing patterns.
Description
Business Context & Problem
Real-estate platforms face significant risk when listings either misstate property features or deviate sharply from market norms. For a service handling apartment sales in a major city, detecting listings that significantly deviate in price or features can save cost, maintain trust and guide corrective actions. This analysis aims to surface anomalies and understand the broader pricing dynamics in the apartment market.
Data & Analytical Approach
The dataset contained apartment listings with features like room count, area in m², ceiling height, distance to the city centre, district, listing date and price. After loading the data, missing values and duplicates were handled, numeric distributions were reviewed and extreme values flagged. Feature engineering added derived variables (for example price per m², adjusted listing age). Visual exploration (histograms, boxplots, scatterplots) helped reveal patterns and potential outliers across key dimensions.


Statistical / ML Analysis
Using the cleaned and enriched dataset, pair-wise relationships were explored—for instance how price per m² varies with ceiling height or district. Outlier detection techniques (such as boxplot thresholds or z-scores) were applied to identify listings with abnormal values relative to market expectations. The goal was not to build a prediction model, but to flag unusual entries and highlight factors that move price beyond typical ranges.


Key Insights & Final Recommendations
The analysis revealed that listings with unusually high ceiling heights or in certain peripheral districts priced significantly above the median market rate. Properties listed very recently also tended to exhibit higher price per m². For the platform, this suggests two key actions: establish threshold-based flags for listings with extreme attributes (e.g., ceiling height > 3.5 m or price per m² > X) and focus monitoring on districts that deviate most from the market norm. By implementing these checks, the platform can prioritise manual review of high-risk listings and enhance pricing fairness and credibility.
