{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 9 (guest) - Data Visualization with Seaborn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Seaborn\n",
    "\n",
    "**`Seaborn`** is a data visualization library built on the top of `matplotlib`. It was created by [Micheal Waskon at the Center for Neural Science, New York University](https://joss.theoj.org/papers/10.21105/joss.03021).\n",
    "\n",
    "**`Seaborn`** has all the attributes of the `matplotlib` library (it is a child class), making it considerably easy to plot data using Python.\n",
    "\n",
    "We will learn some of these plots in this class and a few customizations. More about `Seaborn` can be found in [here](https://seaborn.pydata.org).\n",
    "\n",
    "Below you can find a list of functions that we can use to plot data on `Seaborn`.\n",
    "\n",
    "![alt image](https://seaborn.pydata.org/_images/function_overview_8_0.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Importing libraries\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns # This is how you import seaborn\n",
    "\n",
    "# Datasets\n",
    "\n",
    "## Political and Economic Risk Dataset\n",
    "# Info on investment risks in 62 countries in 1992\n",
    "# courts  : 0 = not independent; 1 = independent\n",
    "# barb2   : Informal Markets Benefits\n",
    "# prsexp2 : 0 = very high expropriation risk; 5 = very low\n",
    "# prscorr2: 0 = very high bribing risk; 5 = very low\n",
    "# gdpw2   : Log of GDP per capita\n",
    "perisk = pd.read_csv('https://raw.githubusercontent.com/umbertomig/seabornClass/main/data/perisk.csv')\n",
    "perisk = perisk.set_index('country')\n",
    "\n",
    "## Tips Dataset\n",
    "# Info about tips in a given pub\n",
    "# totbill : Total Bill\n",
    "# tip     : Tip\n",
    "# sex     : F = female; M = male\n",
    "# smoker  : Yes or No\n",
    "# day     : Weekday\n",
    "# time    : Time of the day\n",
    "# size    : Number of people\n",
    "tips = pd.read_csv('https://raw.githubusercontent.com/umbertomig/seabornClass/main/data/tips.csv')\n",
    "tips = tips.set_index('obs')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here is what we have in these datasets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>courts</th>\n",
       "      <th>barb2</th>\n",
       "      <th>prsexp2</th>\n",
       "      <th>prscorr2</th>\n",
       "      <th>gdpw2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.720775</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>9.690170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Australia</th>\n",
       "      <td>1</td>\n",
       "      <td>-6.907755</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>10.304840</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Austria</th>\n",
       "      <td>1</td>\n",
       "      <td>-4.910337</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>10.100940</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bangladesh</th>\n",
       "      <td>0</td>\n",
       "      <td>0.775975</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>8.379768</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Belgium</th>\n",
       "      <td>1</td>\n",
       "      <td>-4.617344</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>10.250120</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            courts     barb2  prsexp2  prscorr2      gdpw2\n",
       "country                                                   \n",
       "Argentina        0 -0.720775        1         3   9.690170\n",
       "Australia        1 -6.907755        5         4  10.304840\n",
       "Austria          1 -4.910337        5         4  10.100940\n",
       "Bangladesh       0  0.775975        1         0   8.379768\n",
       "Belgium          1 -4.617344        5         4  10.250120"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "perisk.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>totbill</th>\n",
       "      <th>tip</th>\n",
       "      <th>sex</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>obs</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>16.99</td>\n",
       "      <td>1.01</td>\n",
       "      <td>F</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Night</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10.34</td>\n",
       "      <td>1.66</td>\n",
       "      <td>M</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Night</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>21.01</td>\n",
       "      <td>3.50</td>\n",
       "      <td>M</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Night</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23.68</td>\n",
       "      <td>3.31</td>\n",
       "      <td>M</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Night</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>24.59</td>\n",
       "      <td>3.61</td>\n",
       "      <td>F</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Night</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     totbill   tip sex smoker  day   time  size\n",
       "obs                                            \n",
       "1      16.99  1.01   F     No  Sun  Night     2\n",
       "2      10.34  1.66   M     No  Sun  Night     3\n",
       "3      21.01  3.50   M     No  Sun  Night     3\n",
       "4      23.68  3.31   M     No  Sun  Night     2\n",
       "5      24.59  3.61   F     No  Sun  Night     4"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting Data 101\n",
    "\n",
    "The best way to explore the data is to plot it. However, not all plots are suitable for the variables we want to describe. Starting with a single variable, the first question is what type of variable we are talking about?\n",
    "\n",
    "Types of variables:\n",
    "\n",
    "- `Quantitative` variables: represent measurement.\n",
    "    \n",
    "    + `Discrete`: number of children, age in years, etc.\n",
    "    \n",
    "    + `Continuous`: income, height, GDP per capita, etc.\n",
    "\n",
    "- `Categorical` variables: represent discrete variation\n",
    "\n",
    "    + `Binary`: voted for Trump, smokes or not, etc.\n",
    "    \n",
    "    + `Nominal`: species names, a candidate supported in the primaries, etc.\n",
    "    \n",
    "    + `Ordinal`: schooling, grade, risk, etc.\n",
    "\n",
    "For each variable type, there are specific descriptive stats and plots. Below, see an example of the difference between using the `right` and `wrong` descriptive stats for continuous and binary variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    62.000000\n",
       "mean      9.041875\n",
       "std       0.970264\n",
       "min       7.029973\n",
       "25%       8.381027\n",
       "50%       9.185412\n",
       "75%       9.889280\n",
       "max      10.410180\n",
       "Name: gdpw2, dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Summary stats for a continuous variable (good)\n",
    "perisk['gdpw2'].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7.096721     1\n",
       "9.661735     1\n",
       "9.375601     1\n",
       "9.414342     1\n",
       "9.178953     1\n",
       "            ..\n",
       "10.127270    1\n",
       "9.690170     1\n",
       "7.029973     1\n",
       "8.548692     1\n",
       "9.891465     1\n",
       "Name: gdpw2, Length: 62, dtype: int64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Frequency table for a continuous variable (bad)\n",
    "perisk['gdpw2'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    62.000000\n",
       "mean      0.451613\n",
       "std       0.501716\n",
       "min       0.000000\n",
       "25%       0.000000\n",
       "50%       0.000000\n",
       "75%       1.000000\n",
       "max       1.000000\n",
       "Name: courts, dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Summary stats for a binary variable (bad)\n",
    "perisk['courts'].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    34\n",
       "1    28\n",
       "Name: courts, dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Frequency table for a binary variable (good)\n",
    "perisk['courts'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Univariate Plots\n",
    "\n",
    "*Univariate plots* are plots for single variables.\n",
    "\n",
    "### Quantitative Variables: Histograms\n",
    "\n",
    "Starting with numerical variables, one suitable plot is the *histogram*. It breaks the numerical values into brackets and counts how many values are within each bracket.\n",
    "\n",
    "The syntax is:\n",
    "\n",
    "```\n",
    "sns.displot(data = the_data_frame,\n",
    "    x = 'the_variable',\n",
    "    kind = 'hist',\n",
    "    kde = [..True or False..], \n",
    "    rug = [..True or False..],\n",
    "    bins = [..number of bins..], \n",
    "    stat : [..{\"count\", \"density\", \"probability\"}..],\n",
    "    [..among others..])\n",
    "```\n",
    "\n",
    "Let's plot a histogram for the Log of GDP per capita (`gdpw2`)?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# My code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Customizations\n",
    "\n",
    "We can easily customize the entire plot:\n",
    "\n",
    "1. **Main title**: `plt.title('title here')`\n",
    "\n",
    "2. **X-axis title**: `g.set_xlabels('text')` or `plt.xlabel('text')`\n",
    "\n",
    "3. **Y-axis title**: `g.set_ylabels('text')` or `plt.ylabel('text')`\n",
    "\n",
    "4. **Style**: 'white', 'dark', 'whitegrid', 'darkgrid', and 'ticks'. Usage: `sns.set_style('stylename')`\n",
    "\n",
    "5. Remove the spine: `g.despine(left = True)`\n",
    "\n",
    "6. **Current Palette + display the palette**: `sns.palplot(sns.color_palette())`\n",
    "\n",
    "7. **Which palettes**: `sns.palettes.SEABORN_PALETTES` and to change, use `set_palette('palette')`\n",
    "\n",
    "8. **Save figure**: instead of `plt.show()` use `plt.savefig('figname.png', transparent = False)`.\n",
    "\n",
    "9. **Context**: set the context between 'paper', 'notebook', 'talk', and 'poster'. Use `sns.set_context('context here')`\n",
    "\n",
    "There are even more customization that we can do. Please check the [seaborn documentation](https://seaborn.pydata.org/tutorial/function_overview.html) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# My code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise**: Using the histogram, describe the variables `totbill` and `tip` in the `tips` dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Your answers here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Categorical Variables: Countplot\n",
    "\n",
    "Countplots are suitable for displaying categorical variables. \n",
    "\n",
    "The syntax is:\n",
    "\n",
    "```\n",
    "sns.catplot(\n",
    "    data = the_data_frame,\n",
    "    x = 'the_variable', \n",
    "    kind = 'count')\n",
    "```\n",
    "\n",
    "Let's check the risk of expropriation in each of the countries in 1992."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# My code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All the customizations that we learn apply here as well. We can use them to prettify this plot. \n",
    "\n",
    "However, since the scale is out of order, we can change the order of the x-axis values using the `order` parameter. \n",
    "\n",
    "Even more, for `ordinal` data, it is customary to use a sequential color scheme, i.e., it gets darker as we increase the categories. \n",
    "\n",
    "We can use several palettes:\n",
    "\n",
    "1. `Blues`\n",
    "2. `Greys`\n",
    "3. `PuRd`: Light Purple to Dark Red\n",
    "4. `GnBu`: Light Green to Dark Blue\n",
    "\n",
    "Among others. The syntax to create the color scheme is:\n",
    "\n",
    "```\n",
    "sns.set_palette(\n",
    "    sns.color_palette(\"color_scheme\", # If want revert add '_r'\n",
    "                      [..number_of_colors or as_cmap=True..])\n",
    ")\n",
    "```\n",
    "\n",
    "For more about color palettes, please check [here](https://seaborn.pydata.org/tutorial/color_palettes.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# My code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise**: Do a countplot for the days (`day`) in the `tips` dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Your answers here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bivariate Plots\n",
    "\n",
    "Univariate plots are excellent. But in reality, most of the exciting questions in science come from combinations of multiple variables (e.g., cause and effect, correlations, relationships, etc). \n",
    "\n",
    "For two variables' plots there are three combinations:\n",
    "\n",
    "- **discrete x discrete**: mosaic plot\n",
    "\n",
    "- **discrete x continuous**: several useful types\n",
    "\n",
    "- **continuous x continuous**: scatterplots\n",
    "\n",
    "### Discrete x Discrete Variables: Mosaicplot\n",
    "\n",
    "The mosaic plot gives an idea of how the ratio of one variable changes when we change another variable. For instance, one empirical question that we can ask about the `perisk` dataset is:\n",
    "\n",
    "**Do countries with independent courts have less corruption than countries without independent courts?**\n",
    "\n",
    "The code to test this idea takes two steps. First, we need to prep the data. Then, we plot the data using the `kind = 'bar'` in the `catplot` function.\n",
    "\n",
    "We need to create a table with cumulative values for the two variables we want to study to prep the data. Here is an example of how to do that:\n",
    "\n",
    "```\n",
    "tab = pd.crosstab(df.v1, df.v2, normalize='index') # 1: Crosstab\n",
    "tab = tab.cumsum(axis = 1).\\     # 2: Cummulative sum\n",
    "      stack().\\                  # 3: Stack the results\n",
    "      reset_index(name = 'dist') # 4: Reset the indexes\n",
    "tab\n",
    "```\n",
    "\n",
    "Then, we need to plot the results using `catplot`:\n",
    "\n",
    "```\n",
    "sns.catplot(data = tab,\n",
    "            x = 'v1', # More variation here\n",
    "            y = 'dist', # Proportions\n",
    "            hue = 'v2', # Less variation here\n",
    "            # Comment hue_order if not displaying well\n",
    "            hue_order = tab.v2.unique()[::-1], \n",
    "            dodge = False,\n",
    "            kind = 'bar')\n",
    "plt.show()\n",
    "```\n",
    "\n",
    "*Full disclosure*: A function exists that builds mosaic plots in one line of code. However, I find the results very ugly. You can Google `mosaic plot in python` and check that yourself."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "# My code here: prepping the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# My code here: doing the plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise**: Do the number of smokers (variable `smoker`) vary by the weekday (`day`)?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Your answers here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Discrete x Continuous Variables: Boxplots, Swarmplots, Violinplots\n",
    "\n",
    "Suppose we want to test whether the data distribution varies based on a categorical variable. For example:\n",
    "\n",
    "**Do you think that having an independent judiciary affects the GDP per capita of a country?**\n",
    "\n",
    "We can check if this hypothesis makes sense by looking into the distribution of GDP per capita and segmenting it by the type of judicial institution.\n",
    "\n",
    "The syntax for building these plots is almost the same as making a single boxplot. The difference is that you add the categorical variable to one of the axes:\n",
    "\n",
    "```\n",
    "sns.catplot(\n",
    "    data = data_set, \n",
    "    x = 'categorical_variable',\n",
    "    y = 'continuous_variable',\n",
    "    kind = 'box') # Or 'violin', 'swarm', 'boxen', 'bar'..\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# My code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise**: Are the tips from smokers higher than tips from non-smokers? (the idea is that smokers would compensate non-smokers for the externality caused) Check that in the `tips` dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Your answers here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Continuous x Continuous Variables: Scatterplots and Regplots\n",
    "\n",
    "To plot two continuous variables, one against the other, we can use two functions. First, we can use the `relplot` function if we want to explore the relationship without fitting any trend line. The syntax is the following:\n",
    "\n",
    "```\n",
    "sns.relplot(data = data_set,\n",
    "            x = 'independent_axis_continuous_variable',\n",
    "            y = 'dependent_axis_continuous_variable',\n",
    "            hue = 'optional_categorical_to_color',\n",
    "            kind = 'scatter')\n",
    "```\n",
    "\n",
    "And an excellent version of it, with distribution plots on the top and the left, can be built using the `jointplot` function:\n",
    "\n",
    "```\n",
    "sns.jointplot(data = data_set,\n",
    "              x = 'independent_axis_continuous_variable',\n",
    "              y = 'dependent_axis_continuous_variable',\n",
    "              hue = 'optional_categorical_to_color',\n",
    "              kind = 'scatter') # Or 'scatter', 'kde', \n",
    "                                  'hist', 'hex', 'reg', \n",
    "                                  'resid'\n",
    "```\n",
    "\n",
    "If you want to add a trend line, it is better to use `lmplot` (instead of 'reg' in the plot above). The syntax is the following:\n",
    "\n",
    "```\n",
    "sns.lmplot(data = data_set,\n",
    "    x = \"total_bill\", \n",
    "    y = \"tip\", \n",
    "    hue = \"smoker\",\n",
    "    logistic = ..False or True.., # Logistic fit for discrete y\n",
    "    order = ..polynomial order.., # Polynomial degree\n",
    "    lowess = ..False or True..,   # Lowess fit\n",
    "    ci = ..None..)                # Remove conf. int.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# My code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise**: Are the tips related with total bill in the `tips` dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Your answers here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Great job!!!**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## After-class extras\n",
    "\n",
    "Excellent job learning `seaborn`! It is an easy-to-use yet powerful package to generate lovely plots.\n",
    "\n",
    "Next, you should take a look at the following packages to keep developing your skills:\n",
    "\n",
    "- [`plotnine`](https://plotnine.readthedocs.io/en/stable/index.html#): Implements the ggplot *grammar of graphs* in python\n",
    "\n",
    "- [`cartopy`](https://github.com/SciTools/cartopy): Package to make maps in python.\n",
    "\n",
    "- [`plotly`](https://plotly.com): Builds interactive graphs in python (and other languages). Check also the [`dash`](https://dash.plotly.com/introduction) for plotly in python.\n",
    "\n",
    "Now, try the extra exercises below to sharpen your learning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Extra Datasets\n",
    "\n",
    "## Political Information Dataset\n",
    "# ANES 2000 Political Information based on interviews\n",
    "# polInf          : Political Information\n",
    "# collegeDegree   : College Degree\n",
    "# female          : Female\n",
    "# age             : Age in years\n",
    "# homeOwn         : Own house\n",
    "# others...\n",
    "polinf = pd.read_csv('https://raw.githubusercontent.com/umbertomig/seabornClass/main/data/pinf.csv')\n",
    "pinf_order = ['Very Low', 'Fairly Low', 'Average', 'Fairly High', 'Very High']\n",
    "polinf['polInf'] = pd.Categorical(polinf.polInf, \n",
    "                                  ordered=True, \n",
    "                                  categories=pinf_order)\n",
    "\n",
    "## US Crime data in the 1970's\n",
    "# Data on violent crime in the US\n",
    "# Muder: number of murders in the state\n",
    "# Assault: number of assaults in the state\n",
    "# others...\n",
    "usarrests = pd.read_csv('https://raw.githubusercontent.com/umbertomig/seabornClass/main/data/usarrests.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>polInf</th>\n",
       "      <th>collegeDegree</th>\n",
       "      <th>female</th>\n",
       "      <th>age</th>\n",
       "      <th>homeOwn</th>\n",
       "      <th>govt</th>\n",
       "      <th>length</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Fairly High</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>49.0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>58.400002</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Average</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>35.0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>46.150002</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Very High</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>57.0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>89.519997</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Average</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>63.0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>92.629997</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Fairly High</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>40.0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>58.849998</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        polInf collegeDegree female   age homeOwn govt     length  id\n",
       "0  Fairly High           Yes     No  49.0     Yes   No  58.400002   1\n",
       "1      Average            No    Yes  35.0     Yes   No  46.150002   2\n",
       "2    Very High            No    Yes  57.0     Yes   No  89.519997   3\n",
       "3      Average            No     No  63.0     Yes   No  92.629997   4\n",
       "4  Fairly High           Yes    Yes  40.0     Yes   No  58.849998   4"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "polinf.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Murder</th>\n",
       "      <th>Assault</th>\n",
       "      <th>UrbanPop</th>\n",
       "      <th>Rape</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>13.2</td>\n",
       "      <td>236</td>\n",
       "      <td>58</td>\n",
       "      <td>21.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10.0</td>\n",
       "      <td>263</td>\n",
       "      <td>48</td>\n",
       "      <td>44.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8.1</td>\n",
       "      <td>294</td>\n",
       "      <td>80</td>\n",
       "      <td>31.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8.8</td>\n",
       "      <td>190</td>\n",
       "      <td>50</td>\n",
       "      <td>19.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9.0</td>\n",
       "      <td>276</td>\n",
       "      <td>91</td>\n",
       "      <td>40.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Murder  Assault  UrbanPop  Rape\n",
       "0    13.2      236        58  21.2\n",
       "1    10.0      263        48  44.5\n",
       "2     8.1      294        80  31.0\n",
       "3     8.8      190        50  19.5\n",
       "4     9.0      276        91  40.6"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "usarrests.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercises\n",
    "\n",
    "1. (Univariate) In the `polinf` dataset, make a count plot of the variable `polInf`. Imagine you want to use this for a talk, so adjust the context. Change the x-axis label and title to appropriate descriptions of the data. (Hint: to rotate the axis tick labels, use `plt.xticks(rotation=number_degree_of_your_choice)`)\n",
    "\n",
    "2. (Univariate) In the `polinf` dataset, make a histogram of the variable `age`. (Hint: set the context back to `notebook` before starting)\n",
    "\n",
    "3. (Bivariate) Do you think political information varies with a college degree? Check that using the `polinf` dataset!\n",
    "\n",
    "4. (Bivariate) Do you think political information varies with age? Check that using the `polinf` dataset!\n",
    "\n",
    "5. (Bivariate) Do you think there is a correlation between `Murder` and `Assault`? Check that using the `usarrests` dataset!\n",
    "\n",
    "6. (Challenge: Multivariate) There are four continuous indicators in the `usarrests` dataset: `Murder`, `Assault`, `UrbanPop`, and `Rape`. Do you think you can build a scatterplot matrix? The documentation is in [here](https://seaborn.pydata.org/examples/scatterplot_matrix.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Your answers here"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}