SDS 192: Introduction to Data Science

Author

Lindsay Poirier

Published

September 9, 2022

Syllabus

Data science involves applying a set of strategies to transform a recorded set of values into something from which we can glean knowledge and insight. This course will introduce you to concepts and methods from the field of data science, along with how to apply them in R. You will learn how to acquire, clean, wrangle, and visualize data. You will also learn best practices in data science workflows, such as code documentation and version control. Issues in data ethics will be addressed throughout the course.

Classes will be held on Mondays, Wednesdays, and Fridays from 9:25 AM to 10:40 AM in McConnell 404.

SDS 100: Reproducible Scientific Computing with Data is a co-requisite for this course and designed to help support you in coding in R. Please note that I walk into this course with the assumption that most students have never coded before. Coding for the first time can be intimidating, but I intend do everything in my power to support you through the learning curve and to make things both fun and relevant in the process. I personally picked up most of my data science skills through a lot of trial-and-error, practice, and curiosity. My hope is that, in this course, you will learn through experimentation, along with independent and collaborative problem-solving. Honing these competencies will serve you as you move on to other courses in the SDS program and/or at Smith.

Course Instructor

Lindsay Poirier, she/her/hers.

While you’re welcome to refer to me as Professor Poirier, I would prefer it if you called me Lindsay. I am a cultural anthropologist that studies how public interest datasets get produced, how communities think about and interface with data, and how data infrastructure can be designed more equitably. My Ph.D. is in an interdisciplinary discipline called Science and Technology Studies - a field that studies the intricate ways science, technology, culture, and politics all co-constitute each other. I work on a number of collaborative research projects that leverage public data to deepen understanding of social and environmental inequities in the US, while also qualitatively studying the politics behind data gaps and inconsistencies. As an instructor, I prioritize active learning and often structure courses as flipped classrooms. You can expect in-class time to predominantly involve a mix of lectures and live problem-solving exercises.

Getting in Touch

I can best support students in this course when I can readily keep tabs on our course-related communication. Because of this, I ask that you please don’t email me regarding course-related questions or issues. The best way to get in touch with me is via our course Slack. If you have course-related questions, I encourage you to ask them in the #sds-192-questions channel. When discretion is needed, feel free to DM. Please reserve more formal concerns like grades or accommodation requests for an in-person (or in-person virtual) conversation.

During the week, I will try my best to answer all Slack messages within 24 hours of receiving them. Please note that to maintain my own work-life balance, I often don’t answer Slack messages late in the evenings or on the weekends. It’s important that you plan when you start your assignments accordingly.

Office hours are a great opportunity for us to chat about what you’re learning in the course, clarify expectations on assignments, and review work in progress. I also love when students drop in to office hours to request book recommendations, discuss career or research paths, or just to say hi! I encourage each student in the course to join office hours at least once this semester. If you’re unable to attend my office hours at the regularly scheduled time, there is link on Moodle to book a meeting with me.

  • Monday, 3-4, McConnell 212-213
  • Wednesday, 3-5, McConnell 212-213

Course Texts

A number of excellent textbooks introducing data science concepts and methods have been written in the past few years, including a few from faculty in the Smith SDS department. To accompany the topics we will cover each week, I will be selecting my favorite chapters from these books and posting them to Perusall. However, all three books we will engage in this course cover almost every topic we will address, so feel free to supplement your reading with corresponding chapters in the other books: especially if you find yourself drawn to the teaching and writing style in a certain book. All books are available for free online.

  • Baumer, Benjamin S., Daniel T. Kaplan, and Nicholas J. Horton. 2021. Modern Data Science with R. 2nd ed. CRC Press. https://mdsr-book.github.io/mdsr2e/.

  • Irizarry, Rafael A. 2022. Introduction to Data Science. Data Analysis and Prediction Algorithms with R. https://rafalab.github.io/dsbook/.

  • Ismay, Chester, and Albert Y. Kim. 2021. Modern Dive: Statistical Inference via Data Science. CRC Press. https://moderndive.com/.

Each week I will also list optional reading and resources in our course schedule that you may reference if you are struggling with a topic or if you wish to explore that topic further. I will update this list often throughout the semester.

Assessment

This course will be graded via a standards-based assessment system.

Spinelli Center

Smith’s Spinelli Center offers a number of resources to support SDS students. Spinelli Center Data Assistants will visit our classroom regularly to support you through lab work. The Center also offers drop-in tutoring hours Sunday through Thursday 7-9 PM. Finally, you can drop-in to Seelye 207D or schedule an appointment with the Data Research and Statistics Counselor (Kenneth Jeong). To schedule an appointment, email qlctutor@smith.edu.

Policies

This is a 4-credit course with 4.5 hours per week of in-classroom instructions. Smith expects students to devote 7.5 out-of-class hours per week to 4-credit classes. I have designed the course assignments and selected the course readings with this target in mind.

Attending class is not only important for your learning but also an act of community. Attendance is expected in this course. Many course assignments will be completed in-class. That said, you do not need to inform me when you will be absent. If you are sick, please stay home. If you must miss a class entirely, you should contact a peer to discuss what was missed. Please note that the SDS Program has adopted a shared policy regarding in-person attendance this semester:

In keeping with Smith’s core identity and mission as an in-person, residential college, SDS affirms College policy (as per the Provost and Dean of the College) that students will attend class in person. SDS courses will not provide options for remote attendance. Students who have been determined to require a remote attendance accommodation by the Office of Disability Services will be the only exceptions to this policy. As with any other kind of ADA accommodations, please notify your instructor during the first week of classes to discuss how we can meet your accommodations.

There is an automatic 24-hour grace period on all lab and project assignments. There will be no penalties for submitting the project within this 24-hour period, and you do not need to inform me that you intend to take the extra time. You can also request up to a 72-hour extension on any project or lab assignment, as long as you make that request at least 48 hours before the original assignment due date. You can request an extension by filling out the Extension Request form on Moodle, and I will confirm your extension on Slack. Beyond this, late assignments will not be accepted.

Smith College expects all students to be honest and committed to the principles of academic and intellectual integrity in their preparation and submission of course work and examinations. Students and faculty at Smith are part of an academic community defined by its commitment to scholarship, which depends on scrupulous and attentive acknowledgement of all sources of information, and honest and respectful use of college resources.

Any cases of dishonesty or plagiarism will be reported to the Academic Honor Board. Examples of dishonesty or plagiarism include:

  • Submitting work completed by another student as your own.
  • Copying and pasting words from sources without quoting and citing the author.
  • Paraphrasing material from another source without citing the author.
  • Failing to cite your sources correctly.
  • Falsifying or misrepresenting information in submitted work.
  • Paying another student or service to complete assignments for you.
Deadlines for Quizzess and Course Advancement Assignments

The standards you will be practicing in this course all build off of each other, and it’s important that I know how students are doing on each standard in order to direct my teaching going forward. Because of this, the deadlines for quizzes are a bit more firm than for other assignments. On the course schedule and in Moodle you will see a suggested deadline for quizzes and a final deadline. I won’t be able to accept quiz submissions after the final deadline, so you’ll want to be sure to stay on top of these dates. Similarly, course advancements assignments are assignments that need to be completed to keep our course moving. Because of this, they need to be completed by the due date. I will give you an opportunity to work on many of these assignments in class.

Community

As the instructor for this course, I am committed to making participation in this course a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. Examples of unacceptable behavior by participants in this course include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.

As the instructor I have the right and responsibility to point out and stop behavior that is not aligned to this Code of Conduct. Participants who do not follow the Code of Conduct may be reprimanded for such behavior. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the instructor.

All students and the instructor are expected to adhere to this Code of Conduct in all settings for this course: seminars, office hours, and over Slack.

This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available here.

I hope that we can foster a collaborative and caring environment in this classroom: one that celebrates successes, respects individual strengths and weaknesses, demonstrates compassion for each other’s struggles, and affirms diverse identities. Here are some ideas that I have for creating this environment in our course:

  • Check-in with colleagues before starting collaborative work. “What three words describe how you’re feeling?” “Name one challenge and one success from this week.” “What are you doing for self-care right now?” Thank each other for sharing where they’re at.
  • Consider when to step up and when to step back in class discussions, creating space for others to contribute. Listening is just as important to community-building as speaking.
  • Acknowledge that there is much we don’t know about how our colleagues experience the world. …but don’t ask colleagues to speak on behalf of a social group you perceive them to be a part of.
  • Cheer on colleagues as they give presentations or try something out for the first time.
  • Ask questions often in our #sds-192-questions channel. Help each other out by answering questions when you can.
  • Mistakes happen. I will certainly make mistakes in class. Admit mistakes, and then move on.

Using the proper pronouns for our students is foundational to a safe, respectful classroom environment that creates a culture of trust. For information on pronouns and usage, please see the Office of Equity and Inclusion link here: Pronouns

Support

It is my goal for everyone to succeed in this course. If you have personal circumstances that may impact your experience of our classroom, I encourage you to contact Office of Disability Services in College Hall 104 or at ods@smith.edu. The Office will generate a letter that indicates to me what kind of support you need and how I can make your classroom experience more accommodating. Once you have this letter, you are welcome to visit my office hours or email me to discuss ideas about how we can tailor the course accordingly. While you can request accommodations at any time, the sooner we start this conversation, the better. If you have concerns about the course that are not addressed through ODS, please contact me. At no point will I ask you to divulge details about your personal circumstances to me.

College life is stressful, and life outside of college can be overwhelming. It is my position that attending to your physical and mental health and well-being should be a top priority. I will remind you of this often throughout the semester. I encourage you to schedule a time to talk with me if you are struggling with this course. If you, or anyone you know, is experiencing distress, there are numerous campus resources that can provide support via the Schacht Center. I can point you to these resources at any time throughout the semester.

A trigger is a topic or image that can precipitate an intense emotional response. When common triggering topics are to be covered in this course, I will do my best to provide a trigger warning in advance of the discussion. However, I can’t always anticipate triggers. With this in mind I’ve set up an anonymous form, available on Moodle, where you can indicate topics for which you would like me to provide a warning.

Infrastructure

Grades, forms, handouts, and quizzes will be available on the course Moodle.

All course readings and recorded lectures will be available on Perusall. You can access Perusall via our course Moodle page.

  • #general: Course announcements (only I can post)
  • #sds-192-discussions: Share news articles and relevant opportunities
  • #sds-192-questions: Ask and answer questions about our course
  • You can also create private Slack channels with your project group members.

I will be using GitHub Classroom to distribute several course assignments, and you will submit assignments by pushing changes to template documents to a private GitHub repository. I will provide guidance on how to do this early in the semester.

RStudio/RStudio Server

This class will use the R statistical software package. In the first week of the course, I will help you install and configure R and RStudio. If you are using a laptop, you will install both on that computer. If you are using a Chromebook or Tablet, an account will be created for you on the Smith College RStudio Server so that you can access a cloud-based version of RStudio. You should let me know in the first week of the course if you are using a Chromebook or tablet.