An Analyze of Lisbon Neighborhoods to Help a Family Moving In Using Data.

A journey to find best neighborhood for a family moving to Lisbon.

Lisbon, Portugal

A Little Introduction

Lisbon is the capital of Portugal with a population of approximately 504,718 inhabitants, it has a mild climate, with less severe winters compared to other countries in Europe.

The city has been continuously investing in order to become a technology hub and attract startups and professionals of excellence in the area of ​​information technology with many professionals migrating from Brazil and other parts of the world.

Considering the situation above, people constantly accept new job offers being expatriated from their country and migrating to Lisbon with their wife and children, needing to find neighborhoods in the city so that they can establish their new residence, in a family with children, they need a neighborhood that has schools, pharmacies, hospitals and parks. Preferably within a radius of 1km facilitating access to them.

Data Gathering

Given the problems, it was necessary to collect the neighborhoods of Lisbon, the dangerous neighborhoods and the establishments that make up each neighborhood, using the following sources, APIs:

  • To collect the names of Lisbon’s neighborhoods, webscarping was carried out at wikipedia na url (https://pt.wikipedia.org/wiki/Bairros_de_Lisboa).
  • Google Geocode API had been used to capture the latitude and longitude for the Neighborhoods and also the informations, like ratings and coordinates, about schools, pharmacies, hospitals and parks within a radius of 1,000m from the location of the neighborhoods.

Methodology

In the data collection phase, BeatifulSoup was used to assist in the webscraping of both URLs, transposing the selected data into DataFrames using our beloved Pandas library, with which we obtain the lists of all the neighborhoods in Lisbon.

First 10 Neighborhoods in the Dataset

In order to be able to add to the dataframe, the latitude and longitude of each neighborhood, it was necessary to use the Google Maps Geocode API library and store the information corresponding to each neighborhood.

After this phase, it was necessary to prepare the data obtained:

  • The data as checked to see if was any Null value.
  • Some latitutes and longitutes returned were from Greater Lisbon, so a filter had been created for these to be excluded from this study.
Lisbon Neighbirhoods and its Coordinates

Using the folium library it was possible to obtain a first map of Lisbon with all its neighborhoods applying its latitude and longitude:

Lison Neighborhoods

For the collection process of the establishments, being these specific: schools, parks, pharmacies and hospitals, it was done through the Google PlacesAPI, where was collected the establishments, location (latitude and longitude) and ratings of each one.

Locations if rated less than 4 out of a total of 5 were discardes, thus leaving only the best establishments in each neighborhood. By grouping them and their quantities, it is possible to verify their distribution in each neighbor.

Distribuition of the Quantity of Schools, Hospitals, Parks and Drugstores per Neighborhoods

To classify the data and group the neighborhoods, a machine learning k-mean cluster model was used that makes it possible to find and group the data according to common characteristics. The greatest difficulty of this method is to understand the number of groups that these data must be divided for that, was used the Yellowbrick library, which shows the correct number of groups to be used in the model.

Results

After applying the model to the data, they were divided into 5 groups and these were assigned to the neighborhoods thus creating the final classification of the Lisbon neighborhoods according to the quantity and quality of schools, pharmacies, hospitals and parks.

Neighborhoods Classified by Clusters

Using the Folium library, they were plotted on the map of Lisbon and divided into different colors according to their group.

Neighborhoods Ploted by Cluster

Discursion

Analyzing the formed clusters, it was possible to categorize them according to the following criteria:

Group 1: the neighborhoods with the largest supply of schools and parks were grouped within a radius of 1,000m containing a balanced supply of hospitals and pharmacies.

Group 2: neighborhoods were grouped with an intermediate supply of schools, but with a reduced supply of parks and balanced in the number of hospitals and pharmacies.

Group 3: a balanced number of parks and schools and in relation to the previous groups, but with a focus on the number of hospitals and an increase in the number of pharmacies.

Group 4: the largest number of parks of all groups and a reasonable number of schools.

Group 5: the lowest number of quality establishments of all study groups.

Conclusion

For a family with children, looking for balance and good education, it can be said that the Group 1 neighborhoods are the best ones to establish themselves, below, these regions are represented on the map of Lisbon:

Neigbborhoods of the Group 1 on the Map

List of the neighborhoods of the Group 1 and the quantities of schools, hospital, parks and drugstores:

The study should be improved in some aspects in its future versions, such as the addition of exclusion zones where the most dangerous neighborhoods in Lisbon should be taken into account, an increase in places to also be considered important for a family with shopping centers and supermarkets and lastly it analyzes the prices of the properties in the groups.

References

Google API Documentation: https://developers.google.com/maps/

Lisbon Neighborhoods: https://pt.wikipedia.org/wiki/Bairros_de_Lisboa

A Data Scienctist eager to understand better how and why the outcomes are data related!