Programming with data

NYU ITP, Fall 2023. Instructor: Allison Parrish. Send me e-mail.

Important links: Schedule, code and notes, homework form.

Description

Data is the means by which we turn experience into something that can be published, compared, and analyzed. Data can facilitate the production of new knowledge about the world—but it can also be used as a method of control and exploitation. As such, the ability to understand and work with data is indispensable both for those who want to uncover truth, and those who want to hold power to account. This intensive course serves as an introduction to essential computational tools and techniques for working with data. The course is designed for artists, designers, and researchers in the humanities who have no previous programming experience. Covered topics include: the Python programming language, Jupyter Notebook, data formats, regular expressions, Pandas, web scraping, relational database concepts, simple data visualization and data-driven text generation. Weekly technical tutorials and short readings culminate in a self-directed final project.

Course objectives

The goal of the course is to help students achieve beginning to intermediate proficiency in a number of technical tools relevant to exploratory data analysis, including the Python programming language and the Pandas dataframe library. Students will become familiar with conventions surrounding the structure of datasets, and learn techniques for “cleaning” and adapting data for different kinds of use. Additionally, by the end of the course, students will be conversant in current discourses surrounding the ethics, philosophy and politics of data collection and data analysis. Proficiency in these topics will be assessed through a midterm project and a final project, alongside a series of technical worksheets.

Schedule

Class schedule with readings, assignments and due dates.

This is a four-credit course that includes a total of 3000 minutes of supervised instruction time, over the course of fourteen weekly sessions. Students can expect to spend six to eight hours per week on course work outside of class.

Grading Policy

Component Percentage
Attendance and participation 25%
Midterm project 20%
Exercises 3 x 10% (30%)
Final project 25%

Here’s the breakdown of how grades correspond with percentages.

Grade Percentage
A 90 to 100
B 80 to 89
C 70 to 79
D 60 to 69
F Below 60

For students taking the class as pass/fail (i.e., all ITP students), anything below a B (79% and below) will be graded as a fail. More information on ITP’s grading policy here.

Resources for learning Python

We’re going to be thorough with the basics, but we’re also going to move fast. Fortunately, there are many resources out there for learning Python. You might benefit from going through some of them. I recommend:

Readings

The course has around 100 pages of assigned reading, spread across three different reading assignments. All readings are available online (please let me know ASAP if you’re having trouble accessing the readings). The purpose of the readings is to help put the technical content of the class in historical and cultural context. We’ll discuss the readings in class.

Projects and assignments

There are two projects in this class (the midterm project and the final project) and three exercises.

Turn in homework using this Google form.

Exercises

The “exercise” assignments are worksheets that take the form of Jupyter Notebooks. The purpose of these exercises is to give you an opportunity to demonstrate your proficiency with the technical material presented in class. The worksheets are Jupyter Notebooks with cells that have missing code. You need to fill in the code so that the cell, when run, produces the expected output (which is indicated in the notebook).

Exercises are graded purely on the basis of participation: if you turn in your filled-in worksheet, then you get full credit. We will go over the exercises in class to answer any lingering questions.

Please note that you’re likely to be able to arrive at correct answers to the exercise problems without actually understanding the underlying code (through the use of, e.g., automated code-writing tools, web searches, or copying off of your friend). I can’t stop you from doing this, but it’s a waste of your time. The purpose of the class is to teach you how the code works, so that you can one day apply your skills to problems novel enough that their solutions cannot be easily arrived at through language models and web searches (by which I mean: interesting and worthwhile problems). You’ll only be able to achieve this if you actually understand the code that you’re writing.

Projects

There are two projects, a midterm project and a final project. These projects are an opportunity for you to demonstrate your ability to synthesize the conceptual and technical material of the class and apply it toward an end that dovetails with your own interests and practice, but are otherwise open brief. In addition to presenting these projects in-class, you must thoroughly document the project in a public place on the Internet (e.g., your ITP blog).

At a minimum, the midterm and final projects should involve undertaking the task of loading a dataset (of the student’s choosing) into Python, and performing the steps of exploratory data analysis on that dataset, in order to reveal (but not necessarily answer) an interesting question about the phenomena that the data describe. In-class presentations for midterm projects will be five to ten minutes, while final project presentations will be fifteen to twenty minutes. The midterm project is intended as a short assignment (conceptualized and executed independently as weekly assignment), while the final project is intended to be designed, executed, and iterated on over several weeks.

Evaluation rubric

Your midterm and final project will be evaluated according to the following criteria: compliance, gregariousness, and stubbornness.

Each assignment will be assigned a score of 0, 1 or 2 in these categories, in accordance with the extent to which the assignment demonstrates the properties described.

Each category will be weighted equally when assigning a final score to each assignment.

Attendance, lateness and in-class behavior policies

You are expected to attend all class sessions. If you’re unable to attend class, please let me know (by e-mail) before class begins. Also, Be on time to class. If you’re more than fifteen minutes late, or if you leave early (without my clearance), it will count as an unexcused absence. Unexcused absences will negatively affect the participation portion of your grade.

On the use of large language models and automated code generation tools

Refer to Vaithilingam et al., whose study shows that LLM-based code generation tools do not “improve the task completion time or success rate,” but do lead to “difficulties in understanding, editing, and debugging” that “significantly hinder” programmers’ “task-solving effectiveness.”

On the use of electronic devices

Laptops will be an essential part of the course and may be used in class during workshops and for taking notes in lectures. Laptops must be closed during class discussions and student presentations. Phone use in class is strictly prohibited unless directly related to a presentation of your own work or if you are asked to do so as part of the curriculum.

Statements

Your instructors are enjoined to include the following statements in our syllabi. Please review them closely.

Statement of academic integrity

Plagiarism is presenting someone else’s work as though it were your own. More specifically, plagiarism is to present as your own: A sequence of words quoted without quotation marks from another writer or a paraphrased passage from another writer’s work or facts, ideas or images composed by someone else.

Statement of principle

The core of the educational experience at the Tisch School of the Arts is the creation of original academic and artistic work by students for the critical review of faculty members. It is therefore of the utmost importance that students at all times provide their instructors with an accurate sense of their current abilities and knowledge in order to receive appropriate constructive criticism and advice. Any attempt to evade that essential, transparent transaction between instructor and student through plagiarism or cheating is educationally self-defeating and a grave violation of Tisch School of the Arts community standards. For all the details on plagiarism, please refer to page 10 of the Tisch School of the Arts, Policies and Procedures Handbook.

Statement on accessibility

Please feel free to make suggestions to your instructor about ways in which this class could become more accessible to you. Academic accommodations are available for students with documented disabilities. Please contact the Moses Center for Student Accessibility for further information.

Statement on counseling and wellness

Your health and safety are a priority at NYU. If you experience any health or mental health issues during this course, we encourage you to utilize the support services of the 24/7 NYU Wellness Exchange at 212-443-9999. Also, all students who may require academic accommodation due to a qualified disability, physical or mental, please register with the Moses Center at 212-998-4980. Please let your instructor know if you need help connecting to these resources.

Statement on Title IX

Tisch School of the Arts to dedicated to providing its students with a learning environment that is rigorous, respectful, supportive and nurturing so that they can engage in the free exchange of ideas and commit themselves fully to the study of their discipline. To that end Tisch is committed to enforcing University policies prohibiting all forms of sexual misconduct as well as discrimination on the basis of sex and gender. Detailed information regarding these policies and the resources that are available to students through the Title IX office can be found by using the following link: Title IX at NYU.