Coursera IBM Data Science Capstone Project Opening a new Indian Restaurant in Toronto, Canada


Coursera IBM Data Science Capstone Project
Opening a new Indian Restaurant in Toronto, Canada










INTRODUCTION

For this project, I am creating a scenario for a Indian Restaurateur  who wants to explore opening an Indian restaurant in the Toronto area.
The idea is behind this project is that there may not be enough Indian restaurants in Toronto and it might present a good opportunity for an entrepreneur who is based in Canada. Indian food is similar to other Asian Cuisines.
So this entrepreneur want to open these restaurants in locations where Asian Food is famous.
With this purpose, I am creating this project to help the entrepreneur to locate a correct location.

Business Issues

The objective of this capstone project is to find the most suitable location for the entrepreneur to open a new Indian restaurant in Toronto, Canada. By using data science methods and machine learning methods such as clustering, this the project aims to provide solutions to  the business question: In Toronto, if an entrepreneur wants to open an Indian restaurant, where should they consider opening it?

Audience

The entrepreneur who wants to find the location to open authentic Indian restaurant

Data Extraction

● The scrapping of Toronto neighborhoods via Wikipedia
● Getting Latitude and Longitude data of these neighborhoods via Geocoder package
● Using Foursquare API to get venue data related to these neighborhoods

Data

To solve this problem, I will need below data:
● List of neighborhoods in Toronto, Canada.
● Latitude and Longitude of these neighborhoods.
● Venue data related to Asian restaurants.
This will help us find the neighborhoods that are most suitable to open an Indian restaurant.

Methodology

First, I need to get the list of neighborhoods in Toronto, Canada. This is possible by extracting the list of neighborhoods from Wikipedia page (“https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M”)

 I did the web scraping and save it in CSV file. However, it is only a list of neighborhood names and postal codes. I will need to get their coordinates to utilize Foursquare to pull the list of venues near these neighborhoods. To get the coordinates, I tried using Geocoder package but it was not working so I used the CSV file provided by the IBM team to match the coordinates of Toronto neighborhoods. After gathering all these coordinates, I visualized the map of Toronto using Folium package to verify whether these are correct coordinates.

Next, I use Foursquare API to pull the list of top 100 venues within 500 meters radius. I have created a Foursquare developer account in order to obtain account ID and API key to pull the data. From Foursquare, I am able to pull the names, categories, latitude, and longitude of the venues. With this data, I can also check how many unique categories that I can get from these venues. Then, I analyze each neighborhood by grouping the rows by neighborhood and taking the mean on the frequency of occurrence of each venue category. This is to prepare clustering to be done later.
Here, I made a justification to specifically look for “Indian restaurants”. Previously, when I run the model, I was looking for “Asian restaurants” but there are very few results (maybe due to Foursquare categorization) so I looked for the restaurants closest to Indian cuisine taste

Lastly, I performed the clustering method by using k-means clustering. K-means clustering algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster while keeping the centroids as small as possible. It is one of the simplest and popular unsupervised machine learning algorithms and it is highly suited for this project as well. I have clustered the neighborhoods in Toronto into 3 clusters based on their frequency of occurrence for “Indian food”. Based on the results (the concentration of clusters), I will be able to recommend the ideal location to open the restaurant.






RESULTS



The results from k-means clustering show that we can categorize Toronto neighborhoods into 3 clusters based on how many Indian restaurants are in each neighborhood:
● Cluster 0: Neighborhoods with no Indian restaurants
● Cluster 1: Neighborhoods with little or no Indian restaurants
● Cluster 2: Neighborhoods with a high number of Indian restaurants
The results are visualized in the above map with Cluster 0 in red color, Cluster 1 in purple color and Cluster 2 in light green color.

Recommendations

Most of the Indian Restaurants are in cluster 2 which is around The Annex, North Midtown, Yorkville and in cluster 1 around The Danforth West, Riverdale
And lowest (close to Zero) in Custer 0 areas which are around Commerce Court, Victoria Hotel and also there are good opportunities to open near Harbourfront, Regent Park, Adelaide, King, Richmond.
Looking at nearby venues it seems Cluster 0 might be a good location as there are not a lot of Asian Restaurants in these areas.

Therefore, this project recommends the entrepreneur to open an authentic INDIAN restaurant in these locations with little to no competition.

Nonetheless, if the food is authentic, affordable and good taste, I am confident that it will have great following everywhere.

Conclusion

 In this project, we have gone through the process of identifying the business problem, specifying the data required, extracting and preparing the data, performing the machine learning by utilizing k-means clustering and providing a recommendation to the stakeholder.

ALL CODES FOR THIS PROJECTS CAN BE FOUND HERE:

References
List of neighborhoods in Toronto: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M Foursquare Developer Documentation: https://developer.foursquare.com/docs



Comments