Programming with data (Fall 2024)

Essential information

NYU ITP, Fall 2024. Instructor: Allison Parrish. Send me e-mail; sign up for office hours.

Important links: Schedule, code and notes, homework form.

Course info

Description

Data is the means by which we turn experience into something that can be published, compared, and analyzed. Data can facilitate the production of new knowledge about the world—but it can also be used as a method of control and exploitation. As such, the ability to understand and work with data is indispensable both for those who want to uncover truth, and those who want to hold power to account. This intensive course serves as an introduction to essential computational tools and techniques for working with data. The course is designed for artists, designers, and researchers in the humanities who have no previous programming experience. Covered topics include: the Python programming language, Jupyter Notebook, data formats, regular expressions, Pandas, web scraping, relational database concepts, simple data visualization and data-driven text generation. Weekly technical tutorials and short readings culminate in a self-directed final project.

Objectives

The goal of the course is to help students achieve beginning to intermediate proficiency in a number of technical tools relevant to exploratory data analysis, including the Python programming language and the Pandas dataframe library. Students will become familiar with conventions surrounding the structure of datasets, and learn techniques for “cleaning” and adapting data for different kinds of use. Additionally, by the end of the course, students will be conversant in current discourses surrounding the ethics, philosophy and politics of data collection and data analysis. Proficiency in these topics will be assessed through a midterm project and a final project, alongside a series of technical worksheets.

Schedule

Class schedule with readings, assignments and due dates.

Communication and office hours

(Please see the top of the page for relevant links!)

If I need to get in touch with students outside of regular course hours, I’ll send an e-mail to students’ NYU e-mail addresses. Please make sure to regularly check your e-mail!

Please send me e-mail if you have questions or need help with an assignment or project! However, please note that I generally do not respond to e-mail over the weekend, and I might not be able to respond to your e-mail before class on Monday. So try to stay on top of your homework so you can e-mail me with questions earlier rather than later.

I set aside time every week to be in my office and receive students to answer questions (or just chat), whether in person or over video chat. If you can’t find a time that is amenable to you on my sign-up sheet, please send me an e-mail and we can work something out separately.

Materials and equipment

I’ll proceed on the assumption that students have access to a computer that is capable of running a recent version of the Python (mainline CPython) and Jupyter Notebook, on a mainstream desktop operating system (e.g. Windows, macOS, Linux). (This does not need to be an especially “recent” computer—I’ll be teaching the class on a decade-old MacBook Air). You’ll probably want several gigabytes of RAM and storage for storing and processing data sets. Ideally, you’ll bring this computer to class to you so you can follow along in real-time with my tutorials. Let me know if these requirements don’t line up with what you’re able to provide; we can almost certainly find a way to provide you with what you need, or some other kind of workaround.

Resources for learning Python

We’re going to be thorough with the basics, but we’re also going to move fast. Fortunately, there are many resources out there for learning Python. You might benefit from going through some of them. I recommend:

Expectations and details

This is a four-credit course that includes a total of 3000 minutes of supervised instruction time, over the course of fourteen weekly sessions. Students can expect to spend six to eight hours per week on course work outside of class.

Grading policy

Component Percentage
Attendance and participation 25%
Midterm project 20%
Exercises 3 x 10% (30%)
Final project 25%

Here’s the breakdown of how grades correspond with percentages.

Grade Percentage
A 90 to 100
B 80 to 89
C 70 to 79
D 60 to 69
F Below 60

For students taking the class as pass/fail (i.e., all ITP students), anything below a B (79% and below) will be graded as a fail. More information on ITP’s grading policy here.

Readings

The course has around 100 pages of assigned reading, spread across three different reading assignments. All readings are available online (please let me know ASAP if you’re having trouble accessing the readings). The purpose of the readings is to help put the technical content of the class in historical and cultural context. We’ll discuss the readings in class.

Projects and assignments

There are two projects in this class (the midterm project and the final project) and three exercises.

Turn in homework using this Google form.

Exercises

The “exercise” assignments are worksheets that take the form of Jupyter Notebooks. The purpose of these exercises is to give you an opportunity to demonstrate your proficiency with the technical material presented in class. The worksheets are Jupyter Notebooks with cells that have missing code. You need to fill in the code so that the cell, when run, produces the expected output (which is indicated in the notebook).

Exercises are graded purely on the basis of participation: if you turn in your filled-in worksheet, then you get full credit. We will go over the exercises in class to answer any lingering questions.

Please note that you’re likely to be able to arrive at correct answers to the exercise problems without actually understanding the underlying code (through the use of, e.g., automated code-writing tools, web searches, or copying off of your friend). I can’t stop you from doing this, but it’s a waste of your time. The purpose of the class is to teach you how the code works, so that you can one day apply your skills to problems novel enough that their solutions cannot be easily arrived at through language models and web searches (by which I mean: interesting and worthwhile problems). You’ll only be able to achieve this if you actually understand the code that you’re writing.

Projects

There are two projects, a midterm project and a final project. These projects are an opportunity for you to demonstrate your ability to synthesize the conceptual and technical material of the class and apply it toward an end that dovetails with your own interests and practice, but are otherwise open brief. In addition to presenting these projects in-class, you must thoroughly document the project in a public place on the Internet (e.g., your ITP blog).

At a minimum, the midterm and final projects should involve undertaking the task of loading a dataset (of the student’s choosing) into Python, and performing the steps of exploratory data analysis on that dataset, in order to reveal (but not necessarily answer) an interesting question about the phenomena that the data describe. In-class presentations for midterm projects will be five to ten minutes, while final project presentations will be fifteen to twenty minutes. The midterm project is intended as a short assignment (conceptualized and executed independently as weekly assignment), while the final project is intended to be designed, executed, and iterated on over several weeks.

Evaluation rubric

Your midterm and final project will be evaluated according to the following criteria: compliance, gregariousness, and stubbornness.

Each assignment will be assigned a score of 0, 1 or 2 in these categories, in accordance with the extent to which the assignment demonstrates the properties described.

Each category will be weighted equally when assigning a final score to each assignment.

Attendance, lateness and in-class behavior policies

You are expected to attend all class sessions. If you’re unable to attend class, please let me know (by e-mail) before class begins. Also, Be on time to class. If you’re more than fifteen minutes late, or if you leave early (without my clearance), it will count as an unexcused absence. Unexcused absences will negatively affect the participation portion of your grade.

On the use of large language models and automated code generation tools

Refer to Vaithilingam et al., whose study shows that LLM-based code generation tools do not “improve the task completion time or success rate,” but do lead to “difficulties in understanding, editing, and debugging” that “significantly hinder” programmers’ “task-solving effectiveness.”

On the use of electronic devices

Laptops will be an essential part of the course and may be used in class during workshops and for taking notes in lectures. Laptops must be closed during class discussions and student presentations. Phone use in class is strictly prohibited unless directly related to a presentation of your own work or if you are asked to do so as part of the curriculum.

What is “participation”?

Examples of what counts as participation: asking questions, going to office hours, sending and reading emails, class group discussion, arriving on time, going to class, taking notes, listening to peers, submitting responses to a form (anonymous or not), following instructions, active listening, etc.

Statements

Your instructors are enjoined to include the following statements in our syllabi. Please review them closely.

Statement of academic integrity

Plagiarism is presenting someone else’s work as though it were your own. More specifically, plagiarism is to present as your own: A sequence of words quoted without quotation marks from another writer or a paraphrased passage from another writer’s work or facts, ideas or images composed by someone else.

Collaboration is highly valued and often necessary to produce great work. Students build their own work on that of other people and giving credit to the creator of the work you are incorporating into your own work is an act of integrity. Plagiarism, on the other hand, is a form of fraud. Proper acknowledgment and correct citation constitute the difference.

Tisch Student Handbook

Statement on accessibility

It’s crucial for our community to create and uphold learning environments that empower students of all abilities. We are committed to create an environment that enables open dialogue about the various temporary and long term needs of students and participants for their academic success. We encourage all students and participants to discuss with faculty and staff possible accommodations that would best support their learning. Students may also contact the Moses Center for Student Accessibility (212-998-4980) for resources and support.

Moses Center for Student Accessibility

Statement on counseling and wellness

Your health and safety are a priority at NYU. Emphasizing the importance of the wellness of each individual within our community, students are encouraged to utilize the resources and support services available to them 24 hours a day, 7 days a week via the NYU Wellness Exchange Hotline at 212-443-9999. Additional support is available over email at wellness.exchange@nyu.edu and within the NYU Wellness Exchange app.

NYU Counseling and Wellness Center

Statement on use of electronic devices

Laptops and other electronic devices are essential tools for learning and interaction in classrooms. However, they can create distractions that hinder students’ ability to actively participate and engage. Please be mindful of the ways in which these devices can affect the learning environment, please refrain from doing non-class oriented activities during class.

Statement on Title IX

Tisch School of the Arts is dedicated to providing its students with a learning environment that is rigorous, respectful, supportive and nurturing so that they can engage in the free exchange of ideas and commit themselves fully to the study of their discipline. To that end, Tisch is committed to enforcing University policies prohibiting all forms of sexual misconduct as well as discrimination on the basis of sex and gender. Detailed information regarding these policies and the resources that are available to students through the Title IX office can be found by using the following link: NYU Title IX Office

Statement of principle

Teachers and students work together to create a supportive learning environment. The educational experience in the classroom is one that is enhanced by integrating varying perspectives and learning modes brought by students.