Syllabus¶
This syllabus may change over the course of the quarter.
You’ll be notified of any major changes but be sure to check here first for any questions!
Course overview and goals¶
This course will introduce students with basic Python and Statistics experience to standard data science methods and procedures commonly used in Computational Social Science.
Goals¶
At the end of the course you should have a working understanding of how to work with dataset in Python, and the basic goals, implementation, and usage of standard data science approaches.
Read in, clean, wrangle, process, and tidy data of different types
Understand how model fitting and prediction works
Evaluate classification, regression
Know when to cluster, reduce dimensionality and how
Strategy¶
This course is structured to help you get there. The basic premise we start from is that programming is a skill, and to acquire a skill you need lots and lots of practice (see Expectations). In lecture, we will introduce new basic elements, and will try them together as a class. In labs, we will work in groups to combine these elements into small programs with instructor help. In problem sets, you will apply these skills on your own to solve specific problems. As much as possible, we will use interesting Computational Social Science problems as motivating problems, but we are constrained first and foremost by finding problems appropriate for your developing Python skills.
Course Information¶
Spring, 2022
Lecture: M/W/F 8-8:50am PDT (CENTR 222)
Labs: M 11-11:50am PDT (ERCA 117) or 9-9:50am PDT (ERCA 117)
Attendance
Attendance at lecture and your assigned lab are required.
Please email the instructor if you are not able to make it to a lecture or lab.
Instructors¶
Role |
Name |
Office hours |
Location |
Zoom |
|
---|---|---|---|---|---|
Instructor |
Erik Brockbank |
Fridays 9:00-10:00AM |
2588 Mandler Hall |
||
TA |
Purva Kothari |
Wednesdays 10:00-11:00AM (after lab) |
ERCA 117 |
All the links¶
Main Class Website: contains all the materials and links
Canvas (UCSD): used to post grades
Campuswire: CODE: 2109 used for all communication: discussion, Q&A, announcements etc. Email the instructor if you cannot access.
Datahub (UCSD): used to submit comleted labs and problem sets
Materials¶
All materials will be provided via this website and the links above.
No textbook is required atop the lectures notes here, but we provide recommendations for some paid and free extracurricular resources
No local software is required (as we will use remotely hosted jupyter notebooks). If you want to install a local copy, we recommend the bundled anaconda distribution of Python 3
Schedule¶
Each week, you’ll be responsible for: (a) coming to lectures and participating in class, (b) attending your assigned lab, (c) turning in lab work, and (d) turning in a weekly problem set (starting in week 2).
Below is the (rough) schedule for each of these, but this may change as the quarter goes on!
Week |
Date |
Topics |
Assignment due |
---|---|---|---|
Week 1 |
Mon. 3/28 |
Welcome! |
|
Week 1 |
Weds. 3/30 |
Python review |
|
Week 1 |
Fri. 4/1 |
Python review |
|
Week 2 |
Mon. 4/4 |
Python for data science: numpy |
|
Week 2 |
Weds. 4/6 |
Python for data science: pandas basics |
|
Week 2 |
Fri. 4/8 |
Python for data science: pandas advanced |
|
Week 2 DEADLINES |
4/4: LAB 1 (Mon. section) |
||
Week 3 |
Mon. 4/11 |
Graphing with python (Guest) |
|
Week 3 |
Weds. 4/13 |
Graphing best practices (Remote) |
|
Week 3 |
Fri. 4/15 |
Graphing with python (Guest) |
|
Week 3 DEADLINES |
4/11: PSET 1 |
||
Week 4 |
Mon. 4/18 |
Getting started: data cleaning |
|
Week 4 |
Weds. 4/20 |
Getting started: tidy data |
|
Week 4 |
Fri. 4/22 |
Getting started: data transformations |
|
Week 4 DEADLINES |
4/18: PSET 2 |
||
Week 5 |
Mon. 4/25 |
Prediction: intro (linear regression) |
|
Week 5 |
Weds. 4/27 |
Prediction: evaluating solutions |
|
Week 5 |
Fri. 4/29 |
Prediction: advanced methods |
|
Week 5 DEADLINES |
4/25: LAB 4 (Mon. section) |
||
Week 6 |
Mon. 5/2 |
Prediction: best practices |
|
Week 6 |
Weds. 5/4 |
Classification: intro (k neighbors) |
|
Week 6 |
Fri. 5/6 |
Classification: evaluating solutions (confusion matrix) |
|
Week 6 DEADLINES |
5/2: LAB 5 (Mon. section) |
||
Week 7 |
Mon. 5/9 |
Classification: evaluating solutions cont’d. (ROC curves) |
|
Week 7 |
Weds. 5/11 |
Classification: other classifiers |
|
Week 7 |
Fri. 5/13 |
Ethics in data science |
|
Week 7 DEADLINES |
5/9: LAB 6 (Mon. section) |
||
Week 8 |
Mon. 5/16 |
Clustering: intro (k means) |
|
Week 8 |
Weds. 5/18 |
Clustering: evaluating solutions |
|
Week 8 |
Fri. 5/20 |
Clustering: advanced (Gaussian mixture models) |
|
Week 8 DEADLINES |
5/16: LAB 7 (Mon. section) |
||
Week 9 |
Mon. 5/23 |
Dimensionality reduction: intro |
|
Week 9 |
Weds. 5/25 |
Dimensionality reduction: PCA |
|
Week 9 |
Fri. 5/27 |
Dimensionality reduction: advanced solutions |
|
Week 9 DEADLINES |
5/23: PSET 6 |
||
Week 10 |
Mon. 5/30 |
HOLIDAY (no class) |
|
Week 10 |
Weds. 6/1 |
Final project presentations: CLASS IN ERCA 117 |
|
Week 10 |
Fri. 6/3 |
Review |
|
Week 10 DEADLINES |
6/1: LAB 10: final project presentations (All sections) |
||
Finals week DEADLINES |
6/6: Late assignments due |
Grading¶
Basis¶
Course grades will be calculated with the following (approximate) breakdown:
35% 6 weekly problem sets (weeks 2-9)
40% 10 weekly labs
15% final project (see Final Project)
10% participation
Labs: Labs are short, simple exercises designed to be completed during the scheduled lab time, with interactive help from instructors and other students. This is a chance for you to practice things we learn and discuss in lecture. Labs are completed by turning them in on datahub. Labs are due by the end of the day one week after the lab meeting. This window is wide so that people who cannot attend lab, or otherwise do not complete the work during lab, can submit on their own schedule. That said, it is very much advised that you attend lab to complete the activities and get interactive help!
Problem Sets: Are longer, weekly assignments. They are released on Mondays (usually) and due by 11:59PM the following Monday unless otherwise noted. You are to complete each problem set on your own. These will often involve material learned during the previous week, so we strongly recommend starting early.
Final: The final is an open-ended data science analysis that you’ll complete with a group. You can read more about it on the Final Project page.
Participation: This component of your grade is based on doing things that help the instructors and other students, and generally creating a positive class environment. This includes things like: showing up and participating during lectures and labs, participating in campuswire discussion (asking good questions, answering others’ questions), demonstrating an interest in learning, not just maximizing your grade, etc.
Letter grades¶
Letter grades will be based on the percentage of total points earned across the items above.
Having encoded the percentage in the variable percent
, we can obtain the grade as follows:
if percent >= 90:
letter = 'A'
remainder = percent - 90
if 90 > percent >= 80:
letter = 'B'
remainder = percent - 80
if 80 > percent >= 70:
letter = 'C'
remainder = percent - 70
if 70 > percent >= 60:
letter = 'D'
remainder = percent - 60
if 60 > percent:
letter = 'F'
remainder = 5
if remainder >= 7:
modifier = '+'
elif remainder < 3:
modifier = '-'
else:
modifier = ''
grade = letter + modifier
Assignment scores¶
Assignment scores will be made available via assignment feedback on datahub (where they are submitted).
Manual Regrades¶
Problem sets and labs are scored using nbgrader on UCSD datahub. Some parts are graded automatically by computer, and some parts are graded manually by a human.
We will work hard to grade everyone fairly and return assignments quickly. And, we know you also work hard and want you to receive the grade you’ve earned. Occasionally, grading mistakes do happen, and it’s important to us to correct them. If you think there is a mistake in your grade on an assignment, post privately on Campuswire to “Instructors & TAs” using the “regrades” tag within 72 hours. This post should include evidence of why you think your answer was correct and should point to the specific part of the assignment in question.
Note that points will not be rewarded if you fail to follow instructions. For example, if the instructions say to name the variable orange
and you name it ornage
(misspelled), you will not be rewarded credit upon regrade. This is because (1) following instructions and being detail-oriented in general, (2) referring to things by their correct names, and getting other minor technicalitieis right is essential to programming.
Questions, feedback, and communication¶
The instructors can be reached in the following ways:
Drop in during scheduled office hours (see above for scheduled times and locations).
Public message on Campuswire.
Private “Instructors & TAs” message on Campuswire
Direct message to specific instructor on Campuswire
Outside of office hours, all communication should happen over Campuswire. Email is reserved for the unanticipated circumstances when campuswire is down, or you cannot access it. In that case, email the instructor about in inability to access Campuswire.
Specific types of questions / comments¶
Questions about course logistics: First, check the syllabus and the detailed how-to pages on the Class Website. If you can’t find the answer there, first ask a classmate. If still unsure, post on Campuswire in the General tag.
Questions about course content: these are awesome! We want everyone to see them, be able to answer them, and have their questions answered too, so post these to Campuswire with an appropriate tag!
My code produces an error that I cannot fix: follow the debugging instructions to find a minimal reproducible example and fill out the debugging question checklist, then post on Campuswire in the “Python” category or the relevant “Problem Set” category.
Assignment clarification questions: Ask in the appropriate “Problem Set” or “Labs” category.
Technical assignment questions: Come to office hours (or post to Campuswire). Answering technical questions is often best accomplished ‘in person’ where we can discuss the question and talk through ideas. However, if that is not possible, post your question to Campuswire. Be as specific as you can in the question you ask. And, for those answering, help your classmates as much as you can without just giving the answer. Help guide them, point them in a direction, provide pseudo code, but do not provide code that answers assignment questions.
Stuck on something for a while (>30min) and aren’t even really sure where to start - Programming can be frustrating and it may not always be obvious what is going wrong or why something isn’t working. That’s OK - we’ve all been there! IF you are stuck, you can and should reach out for help, even if you aren’t exactly sure what your specific question is. To determine when to reach out, consider the 2-hour rule. This rule states that if you are stuck, work on that problem for an hour. Then, take a 30 minute break and do something else. When you come back after your break, try for another 30 minutes or so to solve your problem while working through our debugging checklist. If you are still completely stuck, stop and contact us (office hours, post on Campuswire). If you don’t have a specific question, include the information you have (what you’re stuck on, the debugging checklist).
Questions about a grade - Post on Campuswire with the “Regrades” tag in a private post to “Instructors & TAs”.
Something super cool to share related to class or want to talk about a topic in further depth - come to office hours, post in General, or send in a DM to the instructors!
Campuswire Rules¶
Campuswire is an incredible resource for technical classes. It gives you a place to post questions and an opportunity to answer others’ questions. We do our very best as an instructional staff to make sure each and every question is answered in a timely manner. We also want to make sure this platform is being used to learn and not thwarting anyone’s education. To make all of this possible, there are a few rules for this course’s campuswire:
Before posting your question, look at questions that have already been posted to avoid duplicates.
If posting about an assignment, note title should have assignment number, question number, and 1-2 words about the question. (i.e. PS01 Q1 Variable Naming)
Never post an answer to or code for an assignment on a public post. Pseudocode is encouraged for public posts. If you must include code for an assignment, make this post private (to “Instructors & TAs” only) on Campuswire.
Your post must include not only your question/where you’re stuck, but also what you’ve already done to try to solve it so far and what resources (class notes, online URLs, etc.) you used to try to answer the question up to this point. See how to ask debugging questions.
UCSD policies & resources¶
Academic Integrity¶
From UCSD Academic Integrity office
Integrity of scholarship is essential for an academic community. The University expects that both faculty and students will honor this principle and in so doing protect the validity of University intellectual work. For students, this means that all academic work will be done by the individual to whom it is assigned, without unauthorized aid of any kind.
Please read the full UCSD policy
You are encouraged to work together and help one another for labs. However, you are personally responsible for the work you submit. It is your responsibility to ensure you understand everything you’ve submitted.
You must work independently on the problem sets and the final. You may ask and answer debugging questions on campuswire, but doing work for another student or providing assistance outside of public questions on campuswire on the problem sets or final project will be treated as a violation of academic integrity and you will be referred for disciplinary action. Similarly, emailing with or otherwise communicating with other students or anyone else during a quiz or exam will be treated as a violation and also referred for disciplinary action. Cheating and plagiarism have been and will be strongly penalized. Please review academic integrity policies here.
You are responsible for ensuring that the correct file has been submitted and that the submission is uncorrupted. If, for whatever reason, Canvas or DataHub is down or something else prohibits you from being able to turn in an assignment on time, immediately contact the instructor by emailing the assignment, otherwise the assignment will be graded as late.
Class Conduct¶
In all interactions in this class, you are expected to be respectful. This includes following the UC San Diego principles of community.
This class will be a welcoming, inclusive, and harassment-free experience for everyone, regardless of gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, ethnicity, religion (or lack thereof), political beliefs/leanings, or technology choices.
At all times, you should be considerate and respectful. Always refrain from demeaning, discriminatory, or harassing behavior and speech. Last of all, take care of each other.
If you have a concern, please speak with the Professor, your TAs, or IAs. If you are uncomfortable doing so, that’s OK! The OPHD (Office for the Prevention of Sexual Harassment and Discrimination) and CARE (confidential advocacy and education office for sexual violence and gender-based violence) are wonderful resources on campus.
Disability Access¶
Students requesting accommodations due to a disability must provide a current Authorization for Accommodation (AFA) letter. These letters are issued by the Office for Students with Disabilities (OSD), which is located in University Center 202 behind Center Hall. Please make arrangements to contact Professor privately to arrange accommodations.
Contacting the OSD can help you further: 858.534.4382 (phone) osd@ucsd.edu (email) http://disabilities.ucsd.edu
Important Resources for Students¶
Counseling and Psychology Services (CAPS). “CAPS provides FREE, confidential, psychological counseling and crisis services for registered UCSD students. CAPS also provides a variety of groups, workshops, and drop-in forums.”
CARE at the Sexual Assault Resource Center is the UC San Diego confidential advocacy and education office for sexual harassment, sexual violence and gender-based violence (dating violence, domestic violence, stalking).
Office for the Prevention of Harassment & Discrimination (OPHD). OPHD “works to resolve complaints of discrimination and harassment through formal investigation or alternative resolution.”