Grouping
Here we finally start using donorschoose.org data.
As you know already, donorschoose.org is a crowdfunding website public school teachers can create projects with a monetary goal that help teach their students. Donors make donations to those projects, some of which meet the goal, and some don't.
From now on, you will pretend that donorschoose.org is your business, and that you want to improve the business in any sense. How do you want to improve donorschoose.org? What information will you need from the data to achieve the improvement? That's the first question you have to answer to complete Assignment 2 (in group).
GROUP BY
Let's assume that I want to have projects for high-poverty areas more funded. To do so, I probably need to know the characteristics of the projects for high-poverty areas. How do you know this? One good way of doing this is comparing projects for high-poverty areas with other projects. To do so, we should group the data by the poverty_level
of the areas for which projects are created. Do we have poverty_level
column in our projects
table? Yes:
How do you group rows into poverty_level
? You should do this:
Now, say that we are interested in averages of the size of donations, the number of projects, the number of donors. You are doing this:
Let's say we are also interested in the completion rate:
Conditions with GROUP BY
Above might be wrong because there are projects that are still alive:
Say you are not interested in the groups whose cases are less than 500:
Last updated
Was this helpful?