Project 3
Project 3 – Covid Data Exploration

Overview
Covid had a huge impact over the last 3 years. I’ve seen a lot written about it but the numbers presented are never the same! I want to know what the published numbers are.
The Covid dataset was downloaded from ‘Our World in Data’ website.
I split the downloaded owid-covid-data.csv file into two Excel spreadsheets. This split the data into Covid Cases and Covid Data. This will minimise the number of SQL joins during the early queries and will enable more complex SQL joins in later queries.
Process
1. Prepare
– CovidCases.csv contains – columns A to Y – ‘population’ column – all other columns deleted
– CovidData.csv conatins – columns A to D – columns E to Z deleted – all other columns are retained
– create new database ‘CovidProject’
– Import files using the SQL Server 2019 Import and Export Data tool.
– Tables are called dbo.CovidCases & dbo.CovidData
Queries
1. Total Cases vs Total Deaths
SELECT location, date, total_cases, total_deaths, (total_deaths/total_cases)*100 AS DeathPercentage
FROM CovidCases
–WHERE location LIKE ‘%australia%’
AND continent IS NOT NULL
ORDER BY location, date
2. % Population Infacted
SELECT Location, date, Population, total_cases, (total_cases/population)*100 AS PercentPopulationInfected
FROM CovidCases
–WHERE location LIKE ‘%australia%’
ORDER BY location, date
3. Infaction Rate by Country
SELECT Location, Population, MAX(total_cases) AS HighestInfectionCount, MAX((total_cases/population))*100 AS PercentPopulationInfected
FROM CovidCases
–WHERE location LIKE ‘%australia%’
GROUP BY location, population
ORDER BY PercentPopulationInfected DESC
4. Death Rate by Country
SELECT location, MAX(Total_deaths) AS TotalDeathCount, MAX((total_deaths/population))*100 AS PercentPopulationDied
FROM CovidCases
–WHERE location LIKE ‘%australia%’
— when continent IS NULL then only countries will be displayed
WHERE continent IS NOT NULL
GROUP BY location
ORDER BY PercentPopulationDied DESC
5. Death Rate by Continent
SELECT continent, MAX(Total_deaths) AS TotalDeathCount
FROM CovidCases
–WHERE location LIKE ‘%australia%’
WHERE continent IS NOT NULL
GROUP BY continent
ORDER BY TotalDeathCount DESC
6. Global Total Deaths
SELECT continent, MAX(Total_deaths) AS TotalDeathCount
FROM CovidCases
–WHERE location LIKE ‘%australia%’
WHERE continent IS NOT NULL
GROUP BY continent
ORDER BY TotalDeathCount DESC
7. Percent Population with one or more Vaccinations
SELECT case.continent, case.location, case.date, case.population, data.new_vaccinations, SUM(data.new_vaccinations) OVER (PARTITION BY case.Location ORDER BY case.location,
case.Date) AS RollingVaccinatedCount
FROM CovidCases AS case
JOIN CovidData AS data
ON case.location = data.location
AND case.date = data.date
WHERE case.continent IS NOT NULL AND case.location LIKE ‘%australia%’
ORDER BY case.location, case.date
8. Percent Population with one or more Vaccinations (using CTE)
WITH POPvsVAC (Continent, Location, Date, Population, New_Vaccinations, RollingVaccinatedCount)
AS
(
SELECT case.continent, case.location, case.date, case.population, data.new_vaccinations
, SUM(data.new_vaccinations) OVER (PARTITION BY case.continent ORDER BY case.location, case.Date) AS RollingVaccinatedCount
FROM CovidCases AS case
JOIN CovidData AS data
ON case.location = data.location
AND case.date = data.date
WHERE case.continent IS NOT NULL
)
SELECT *, (RollingVaccinatedCount/Population)*100 AS RollingVaccinatedPercent
FROM POPvsVAC
9. Percent Population with one or more Vaccinations (using Temp Table)
10. Create View for % Population Vaccinated
CREATE VIEW PercentPopulationVaccinated AS
SELECT case.continent, case.location, case.date, case.population, data.new_vaccinations,
SUM(data.new_vaccinations) OVER (PARTITION BY case.continent ORDER BY case.location, case.date) AS RollingPeopleVaccinated
FROM CovidCases AS case
JOIN CovidData AS data
ON case.location = data.location
AND case.date = data.date
WHERE case.continent IS NOT NULL;
Summary
The queries have returned a huge amount of data over time. As an extension of this project the data could be imported into PowerBi or Tableau and a time elapsed map of the data could be revealing.
Although interesting, this project was quite confronting.