BnB Project

#python #pandas #tableau #stakeholder


Goal and takeaways

The goal in this project is to develop my Python/Pandas, Tableau and analysis abilities.

This first stakeholder project allowed me to deepen my understanding of these tools considerably. If I started again however, I would do it with an in depth exploratory data analysis. In retrospect, I should have filtered columns from listings.csv early on.

Environment

I chose to work with Pandas instead of SQL as in this use case, the stakeholders didn’t need live data.

Data source https://insideairbnb.com/get-the-data/

  • Montreal, Quebec, Canada > 06 March, 2025
  • Montreal, Quebec, Canada > show archived data > 23 March, 2024

Background

The client, affiliated with Airbnb, wants to help hosts kick start their spring season. The marketing team has been tasked to deliver customized data by newsletter to the hosts with a market analysis. This analysis will help them identify opportunities in their neighbourhood and optimize their bookings to attract customers.

The data extracted needs to be sensible to the type of renting the hosts offer. They also want the data to be filtered by neighbourhood so that they can send a tailored analysis.

They want the following data:

  • An overview of the renting prospect of this year vs last year
  • The average price of BnBs
  • The demand per number of bedrooms and the booking capacity of the actual market.
  • They also want to help BnB hosts facilitate the renting experience by giving them data insights with the “Instant bookable” option available to the customers.

Steps

Image Description

yoy_bnb.csv

calendar.csv and calendar_2024.csv

Image Description

Data cleaning

  • Formatting floats or converting to integers
  • Formatting currency
  • Verifying if there are Nulls and duplicates
  • Replacing strings for Boolean
	df1 = df1.replace({'available': {'t': True, 'f': False}})
	```

#### Data wrangling for each calendar

**New DataFrame with aggregated columns**

* We are looking for the AVG of unavailable bookings per day. Optional: Formatted in percentage.
* Grouped by date. *This also helps to reduce the amount of data so that Tableau Public handles.* 

```Python
#group bookings by date 
bookings1 = df1.groupby('date')['available'].agg(['sum', 'count'])

#To obtain unavailable bookings (booked bnbs), we need to subtract: total (count) - sum ( amount of "True"; available bookings)
bookings1['sum_booked25'] = bookings1['count'] - bookings1['sum']

# Perc. calculation for both true and false 
bookings1['perc_avail25'] = (bookings1['sum'] / bookings1['count']) * 100
bookings1['perc_booked25'] = (bookings1['sum_booked25'] / bookings1['count']) * 100

#Reset_index to replace date as an index
bookings1.reset_index(inplace = True)

print(bookings1)

Rename and reorder columns

#rename columns
bookings1.rename(columns={'sum': 'sum_avail25', 'count': 'total_count25'}, inplace=True)
#Re-order columns
bookings1 = bookings1[['date','total_count25','sum_avail25','perc_avail25','sum_booked25','perc_booked25']]

Repeat the same operations for the second calendar dataset

Merging tables

Creating new DataFrame to join onto

#Create new DataFrame

start_date = '2024-03-23'
end_date = '2026-03-11'

#Generate date range
date_range = pd.date_range(start = start_date, end = end_date)

#Create dataframe

dfDates = pd.DataFrame(date_range, columns = ['date'])

Removing overlapping dates

bookings1_365 = bookings1.loc[(bookings1['date'] >= '2025-03-23')]

#Reset_index
bookings1_365.reset_index(drop = True)

bookings1_365

Merging the two calendars


df5 = pd.merge(dfDates, bookings2, on = 'date', how = 'left')
df5 = pd.merge(df5, bookings1_365, on = 'date', how = 'left')

df5

Image Description

df5.to_csv('YoY_bnb.csv', index=False)

Listings_clean.csv

Data cleaning

  • Verifying if there are any Nulls and duplicates

Data wrangling

Removing inactive listings

  • by dropping listings either fully available or unavailable for the upcoming 365 days.
df = df.drop(df[(df['availability_365'] == 365) | (df['availability_365'] == 0)].index)

It removed around a 10th of all listings


Filtering columns

df2 = df.filter(['id','price','neighbourhood_cleansed','property_type','room_type','accommodates','bedrooms','availability_365','estimated_occupancy_l365d', 'review_scores_rating', 'instant_bookable'], axis=1)

df2

Image Description

df2.to_csv('listings_clean.csv', index=False)

Analysis

Executive Summary

The year on year growth shows that Spring 2025 knows a significantly lower rate of bookings. However, they then slowly pick up from Fall to the beginning of next year. In total, there is a 15% drop this year compared to the last.

The instant bookable feature has the potential to increase occupancy by 6% and number of reviews by 4%, while not markedly affect the reviews rating scores. This feature may present a quick way to increase bookings for hosts.

Airbnb rentals with 1 to 3 bedrooms represent 94% of the market. There is a 1% increase of occupancy for 3 bedrooms and four times less offerings compared to single bedroom listings. Opening up more bedrooms would represent a market opportunity for 1 or 2 bedrooms listings.

Lastly, I recommend offering comparative data on Airbnb pricings and value per bedroom. It can be a great tool for hosts to understand the current market and optimize their listings.

Image Description

Insights Deep-Dive

Instant bookable feature

Instant bookable listing have an estimated increase of 6% in occupancy, 4% in number of reviews, and keeps a negligible impact on the average review ratings. The average review rating has a difference of only 0.2/5. This means that there are many ways to give a great and personalized experience to renters, either it is with instant bookings or not. There is a higher ratio of hosts that don’t use the feature at 1.24 to 1.

Image Description

Year on year growth

We can observe a drop in bookings from March to May of 18% in comparison to last year. The month of April knows the lowest monthly performance of this year with a drop of almost 10%.

Comparing March 2024 to 2025, we observe a significant drop of 15% of bookings.

On a positive note, from fall to the end of the year and the beginning of 2026, bookings pick back up. More so, there is a 4-5% increase in September and December.

Image Description

Number of bedrooms

BnBs with 1 to 3 bedrooms is where the offer is at its highest. Together, they represent 94% of the all offerings. We can also observe that the distribution of the average occupancy per Bnb is more homogenous.

There is a slight increase in demand and decrease in offer in the range of 1 to 3 bedrooms. Whilst 1 bedroom BnBs are the most common with more than double the amount of listings from the runner up, 2 bedrooms listings know only a 0.5% increase of occupancy. The BnBs with 3 bedrooms have 4 times less offering and 1% increase of demand compared to single bedroom.

BnBs with 6 bedrooms and more make up for less than 1% of the total amount of offering. Due to the small number of listings, they are left out of my analysis because factors like location, pricing, and overall quality have a bigger impact on occupancy.

Image Description

Pricing

Listings with single bedrooms offer an average renting price of 99$. From there, the price per bedroom bottoms out to around 30% at the 3 bedrooms mark. It then picks back up progressively up to about 20% higher then a single bedroom from 3 to 7.

Once again, 8 bedrooms and more constitute a small number of listings overall, which makes the entries too heterogenous and unreliable for analysis.

Image Description

Image Description

Recommendations

  • Help hosts quickly boost their occupancy: The instant bookable feature is a great tip to recommend to hosts so their can quickly increase their turnover and number of reviews.

  • Highlight the listings most in demand : Showcase the market opportunity for hosts to rent 3 bedrooms. This will help them increase their listings occupancy and may help them take steps to diversify their offer. Note: The main target audience in the Greater Montreal are hosts with 1 to 3 bedrooms. I recommend personalizing the data with this range in mind.

  • Refine key financial data to empower hosts: Offer comparative data in terms of pricing per number of bedroom in their neighborhood and Great Montreal. This will enable hosts further their knowledge of the current market and optimize their listings in terms of bookings and profits.

Clarifying Questions

In the original listings table:

  • How do review_scores_rating, review_scores_value and review_scores_accuracy differentiate themselves from each other?

Defining the next steps of the analysis:

  • Are there more relevant insights they would like to have at hand?
  • How do they intent to relay the neighbourhood and Great Montreal comparison in the newsletters?

Assumption and caveats

Assumptions

calendar.csv + calendar24.csv

  • Snapshots are less than a year apart:
    • The dates of the two snapshots overlap by 16 days, which makes the data less accurate for comparison.
    • From 2025 March 06 to March 22, the data overlaps. I removed these dates from the 2025-2026 dataset to increase accuracy.
    • the month of March 2026 is missing the last 11 days for a complete two year comparison. Due to this missing data, The YoY comparison is potentially skewed.
  • More than 365 days of data scraping:

    • The 2025-2026 snapshot originally contained more than 365 days of data. The removal of the overlapping days resulted in 353 days left of for analysis.
  • Missing data:

    • From March 6th to 11th for both 2025 and 2026, the count of listing per day is decreasing, which implies that there are dates missing per listing in the snapshot.

listings.csv

  • Because a listing does not differentiate between a booked vs an unavailable night ( they both have been counted as “unavailable”, I filtered the listings to remove inactive ones.

Tableau Public

  • Because of the massive calendars dataset, I grouped the data before using Tableau to enable the program to process it.