%%capture
%load_ext sql
%sql sqlite:///chinook.db
%sql SELECT * FROM sqlite_master WHERE type='table';
The four most popular genres of tracks in terms of sales in the US are Rock, Alternative & Punk, Metal and R&B/Soul, but each of these are only 3-4% of the total number of sales in the respective genre. The Pop, Blues and Hip-hop genres are the fifth, seventh and ninth most popular genres in terms of US sales. I would therefore recommend that the store purchase the following artists:
%%sql
WITH
country_sales AS
(
SELECT invoice_id, billing_country
FROM invoice
),
tracks_sold AS
(
SELECT track_id, invoice_id, quantity
FROM invoice_line
),
tracks_genre AS
(
SELECT t.track_id, t.genre_id, g.name name
FROM track t
INNER JOIN genre g ON g.genre_id = t.genre_id
),
sold_in_USA AS
(
SELECT tg1.name genre, SUM(ts1.quantity) usa_sales
FROM tracks_sold ts1
INNER JOIN country_sales cs1 ON cs1.invoice_id = ts1.invoice_id
INNER JOIN tracks_genre tg1 ON tg1.track_id = ts1.track_id
WHERE cs1.billing_country = "USA"
GROUP BY tg1.name
)
SELECT tg.name genre,
SUM(ts.quantity) "tracks sold",
s.usa_sales,
SUM(ts.quantity) / s.usa_sales usa_sales_pct
FROM tracks_sold ts
INNER JOIN tracks_genre tg ON tg.track_id = ts.track_id
INNER JOIN country_sales cs ON cs.invoice_id = ts.invoice_id
INNER JOIN sold_in_usa s ON s.genre = tg.name
GROUP BY tg.name
ORDER BY s.usa_sales DESC
In terms of dollar sales, Jane Peacock has the highest performance with $1,731.51, followed by Margaret Park with $1,584.00 and then Steve Johnson with $1,393.92. The order of performance reflects exactly the order in which each sales support agent joined the company - Jane joined first, so she has more time to accrue sales, compared to Margaret and Steve.
%%sql
SELECT (e.first_name || " " || e.last_name) employee,
e.hire_date,
e.birthdate,
ROUND(SUM(i.total),2) sales
FROM employee e
LEFT JOIN customer c ON c.support_rep_id = e.employee_id
LEFT JOIN invoice i ON i.customer_id = c.customer_id
WHERE e.title = "Sales Support Agent"
GROUP BY employee
%%sql
WITH
avg_order_value_country AS
(
SELECT c.country, AVG(i.total) avg_total
FROM invoice i
INNER JOIN customer c ON c.customer_id = i.customer_id
GROUP BY country
)
SELECT c.country,
COUNT(c.customer_id) "number of customers",
ROUND(SUM(i.total),2) "total sales",
CAST(SUM(i.total) / COUNT(c.customer_id) AS INTEGER) "average sales per customer",
ROUND(aovc.avg_total,2) "Average order value"
FROM customer c
INNER JOIN invoice i ON i.customer_id = c.customer_id
INNER JOIN avg_order_value_country aovc ON aovc.country = c.country
GROUP BY c.country
ORDER BY "total sales" DESC
Out of a total of 689 invoices, the vast majority (70%) have been for compilations of individual tracks, whilst just 30% have been for album purchases. So, on the face of it, it would appear to make more sense to mainly cater for only individual tracks being added manually to a purchase. Since the store charges the same for an album as it does for each individual track being added manually, there would be no real financial benefit to purchasing whole albums, instead of the individual tracks of an album individually. But, 30% is still a sizeable minority in terms of customer demand (208 out of 689), so only purchasing the most popular tracks from an album could alienate this group. I would therefore recommend that the store continue to buy full albums, and not just the popular tracks from albums.
%%sql
WITH
invoice_track_category AS
(
SELECT
il.invoice_id,
t.album_id,
COUNT(il.track_id) number_of_tracks,
CASE
WHEN COUNT(il.track_id) = 1 THEN "No"
ELSE "Yes"
END AS is_album
FROM invoice_line il
INNER JOIN track t ON t.track_id = il.track_id
GROUP BY il.invoice_id, t.album_id
),
album_purchase AS
(
SELECT invoice_id, is_album
FROM invoice_track_category
GROUP BY invoice_id, number_of_tracks
)
SELECT COUNT(invoice_id) total_invoices,
(SELECT COUNT(is_album) FROM album_purchase WHERE is_album = "Yes") yes,
(SELECT COUNT(is_album) FROM album_purchase WHERE is_album = "No") no,
ROUND((CAST(COUNT(CASE WHEN is_album = "Yes" THEN 1 ELSE NULL END) AS FLOAT) / COUNT(invoice_id)),2) * 100 pct_yes,
ROUND((CAST(COUNT(CASE WHEN is_album = "No" THEN 1 ELSE NULL END) AS FLOAT) / COUNT(invoice_id)),2) * 100 pct_no
FROM album_purchase
Iron Maiden is the artist that appears in the most number of playlists
%%sql
WITH
artist_album_track AS
(
SELECT artist.artist_id, artist.name artist, track.track_id track
FROM artist
INNER JOIN album ON album.artist_id = artist.artist_id
INNER JOIN track ON track.album_id = album.album_id
),
track_in_playlist AS
(
SELECT p.playlist_id, p.name, pt.track_id
FROM playlist p
INNER JOIN playlist_track pt ON pt.playlist_id = p.playlist_id
)
SELECT aat.artist, COUNT(tip.name) number_of_playlists
FROM artist_album_track aat
INNER JOIN track_in_playlist tip ON tip.track_id = aat.track
GROUP BY aat.artist
ORDER BY number_of_playlists DESC
Of the 6,454 tracks available, 4,757 have been purchased and 1,697 have not been purchased.
%%sql
WITH
purchase_status AS
(
SELECT t.name, il.quantity
FROM track t
LEFT JOIN invoice_line il ON il.track_id = t.track_id
)
SELECT COUNT(name) total_tracks,
COUNT(CASE WHEN quantity > 0 THEN 1 ELSE NULL END) purchased,
COUNT(name) - COUNT(CASE WHEN quantity > 0 THEN 1 ELSE NULL END) not_purchased
FROM purchase_status ps
I interpreted the range of tracks to mean the number of tracks by individual artists. For example, if for each artist, there was equal number of tracks, the range was pretty evenly spread out. On the other hand, there were a lot of artists with a handful of tracks and few of artists with a lot of tracks, then the range would be skewed towards the artists with the most tracks. So, for me, the question was, is the number of tracks by a particular artist reflected in the sales popularity of those tracks? (i.e. Are the most sold tracks by artists with the most number of tracks).
The first stage was to create of table showing the number of tracks in the shop's catalogue per artist.The second stage was to create another table that showed the number of sold tracks per artist. From these two tables, it was possible to identify the artists where the quantity of tracks reflected their sales popularity and the artists where the quantity of the tracks did not reflect their sales popularity.
What I found was that, for the vast majority of the artists, the number of tracks available for sale was not reflected in the number of sales of those tracks.
%%sql
WITH
range_of_tracks AS
(
SELECT art.name, COUNT(t.track_id) number_tracks
FROM track t
INNER JOIN album alb ON alb.album_id = t.album_id
INNER JOIN artist art ON art.artist_id = alb.artist_id
GROUP BY art.name
),
track_popularity AS
(
SELECT art.name, COUNT(il.track_id) number_tracks
FROM invoice_line il
INNER JOIN track t ON t.track_id = il.track_id
INNER JOIN album alb ON alb.album_id = t.album_id
INNER JOIN artist art ON art.artist_id = alb.artist_id
GROUP BY art.name
),
range_and_popularity AS
(
SELECT * FROM range_of_tracks INTERSECT SELECT * FROM track_popularity
),
range_not_popularity AS
(
SELECT * FROM track_popularity EXCEPT SELECT * FROM range_of_tracks
)
SELECT COUNT(*) FROM range_and_popularity
UNION
SELECT COUNT(*) FROM range_not_popularity