Syllabus
Course Overview
This is a 3 credit course offered as an elective.
Instructor
- Vivek Srikrishnan
- viveks@cornell.edu
- 318 Riley-Robb Hall
TA
- Gabriela Ackermann Logan
- ga345@cornell.edu
- 319 Riley-Robb Hall
Meetings
- MW
- 11:40-12:55pm
- 160 Riley-Robb Hall
Course Description
Simulations from numerical and statistical models are a key tool used to quantify and represent our understanding of many different environmental systems, including but not limited to the climate, sea levels, air pollution, and the electric power system. Data analysis of environmental observations and model simulations is an integral part of developing and evaluating models used to 1) test our assumptions about environmental system dynamics; 2) develop new insights into environmental processes; and 3) project future environmental conditions and outcomes. This course will provide an overview of the use of generative probability models to understand data and examine hypotheses about data-generating processes. The goal is to provide students with a framework and an initial toolkit of methods for data analysis and simulation. Students will actively analyze and use real data from a variety of environmental systems. In particular, over the course of the semester, we will:
- conduct exploratory analyses of environmental datasets;
- discuss best practices for and complexities of data visualization;
- calibrate statistical and process-based numerical models using environmental data;
- use simulations from calibrated models to identify key sources of uncertainty and model error;
- assess model fit and adequacy through predictive ability.
Learning Outcomes
After completing this class, students will be able to:
- create, interpret, and critique data visualizations;
- calibrate environmental models to observations, possibly including censored and missing data;
- simulate alternative datasets from models using statistical methods such as the bootstrap and Monte Carlo;
- assess model adequacy and performance using predictive simulations;
- apply and contextualize model selection criteria;
- evaluate evidence for and against hypotheses about environmental systems using model simulations;
- emulate computationally-complex models with simpler representations.
Prerequisites & Preparation
The following courses/material would be ideal preparation:
- One course in programming (e.g. CS 1110, 1112 or ENGRD/CEE 3200)
- One course in probability or statistics (ENGRD 2700, CEE 3040, or equivalent)
In the absence of one or more these prerequisites, you can seek the permission of instructor.
If your programming or statistics skills are a little rusty, don’t worry! We will review concepts and build skills as needed.
Typical Topics
- Introduction to exploratory data analysis;
- Review of probability and statistics;
- Bayesian decision theory;
- Principles of data visualization;
- Model residuals and discrepancies;
- Censored, truncated, and missing data;
- Statistical methods for calibration;
- Predictive model assessment;
- Emulation with surrogate models
Course Meetings
This course meets MW from 11:40–12:55 in 160 Riley-Robb Hall. In addition to the course meetings (a total of 42 lectures, 50 minutes each), the final project will be due during the university finals period. Students can expect to devote, on average, 6 hours of effort during the exam period.
Course Philosophy and Expectations
The goal of our course is to help you gain competancy and knowledge in the area of data analysis. This involves a dual responsibility on the part of the instructor and the student. As the instructor, my responsibility is to provide you with a structure and opportunity to learn. To this end, I will commit to:
- provide organized and focused lectures, in-class activities, and assignments;
- encourage students to regularly evaluate and provide feedback on the course;
- manage the classroom atmosphere to promote learning;
- schedule sufficient out-of-class contact opportunities, such as office hours;
- allow adequate time for assignment completion;
- make lecture materials, class policies, activities, and assignments accessible to students.
I encourage you to discuss any concerns with me during office hours or through a course communications channel! Please let me know if you do not feel that I am holding up my end of the bargain.
Students can optimize their performance in the course by:
- attending all lectures;
- doing any required preparatory work before class;
- actively participating in online and in-class discussions;
- beginning assignments and other work early;
- and attending office hours as needed.
Textbooks and Course Materials
There is no required text for this class, and all course materials will be made available on the course website or through the Cornell library. However, the following books might be useful as a supplement to/expansion on the topics covered in class:
- Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis (3rd ed.). http://www.stat.columbia.edu/~gelman/book/BDA3.pdf
- McElreath, R. (2020). Statistical Rethinking: A Bayesian Course with Examples in R and Stan (2nd ed.). https://xcelab.net/rm/
- D’Agostini, G. (2003). Bayesian Reasoning in Data Analysis: A Critical Introduction.
- Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and Other Stories. https://avehtari.github.io/ROS-Examples/index.html
Community
Diversity and Inclusion
Our goal in this class is to foster an inclusive learning environment and make everyone feel comfortable in the classroom, regardless of social identity, background, and specific learning needs. As engineers, our work touches on many critical aspects of society, and questions of inclusion and social justice cannot be separated from considerations of how data are generated, collected, and analyzed.
In all communications and interactions with each other, members of this class community (students and instructors) are expected to be respectful and inclusive. In this spirit, we ask all participants to:
- share their experiences, values, and beliefs;
- be open to and respectful of the views of others; and
- value each other’s opinions and communicate in a respectful manner.
Please let me know if you feel any aspect(s) of class could be made more inclusive. Please also share any preferred name(s) and/or your pronouns with me if you wish: I use he/him/his, and you can refer to me either as Vivek or Prof. Srikrishnan.
We all make mistakes in our communications with one another, both when speaking and listening. Be mindful of how spoken or written language might be misunderstood, and be aware that, for a variety of reasons, how others perceive your words and actions may not be exactly how you intended them. At the same time, it is also essential that we be respectful and interpret each other’s comments and actions in good faith.
Student Accomodations
Let me know if you have any access barriers in this course, whether they relate to course materials, assignments, or communications. If any special accomodations would help you navigate any barriers and improve your chances of success, please exercise your right to those accomodations and reach out to me as early as possible with your Student Disability Services (SDS) accomodation letter. This will ensure that we have enough time to make appropriate arrangements.
If you need more immediate accomodations, but do not yet have a letter, please let me know and then follow up with SDS.
Course Communications
Most course communications will occur via Ed Discussion. Public Ed posts are generally preferred to private posts or emails, as other students can benefit from the discussions. If you would like to discuss something privately, please do reach out through email or a private Ed post (which will only be viewable by you and the course staff).
Announcements will be made on the course website and in Ed.
- If you wait until the day an assignment is due (or even late the previous night) to ask a question on Ed, there is a strong chance that I will not see your post prior to the deadline.
- But if you see unanswered questions and you have some insight, please answer! This class will work best when we all work together as a community.
Mental Health Resources
We all have to take care of our mental health, just as we would our physical health. As a student, you may experience a range of issues which can negatively impact your mental health. Please do not ignore any of these stressors, or feel like you have to navigate these challenges alone! You are part of a community of students, faculty, and staff, who have a responsibility to look for one another’s well-being. If you are struggling with managing your mental health, or if you believe a classmate may be struggling, please reach out to the course instructor, the TA, or, for broader support, please take advantage of Cornell’s mental health resources.
I am not a trained counselor, but I am here to support you in whatever capacity we can. You should never feel that you need to push yourself past your limits to complete any assignment for this class or any other. If we need to make modifications to the course or assignment schedule, you can certainly reach out to me, and all relevant discussions will be kept strictly confidential.
Course Policies
Attendance
Attendance is not required, but in general, students who attend class regularly will do better and get more out of the class than students who do not. Your class participation grade will reflect both the quantity and quality of your participation, only some of which can occur asynchronously. I will put as many course materials, such as lecture notes and announcements, as possible online, but viewing materials online is not the same as active participation and engagement. Life happens, of course, and this may lead you to miss class. Let me know if you need any appropriate arrangements ahead of time.
Please stay home if you’re feeling sick! This is beneficial for both for your own recovery and the health and safety of your classmates. We will also make any necessary arrangements for you to stay on top of the class material and if whatever is going on will negatively impact your grade, for example by causing you to be unable to submit an assignment on time.
Office Hours
Office hours will be held in 318 Riley-Robb after class (1-2pm) on MW. If these times do not work for you, or you need some additional time outside of office hours, please reach out to Prof. Srikrishnan about scheduling a meeting. Depending on schedules, these requests may not be accepted on short notice (e.g. homework is due on Friday and you reach out late on Thursday), but with several days notice we should be able to find a time that will work.
Office hours are intended to help all students who attend. This time is limited, and is best spent on issues that are relevant to as many students as possible. While we will do our best to answer individual questions, students asking us to verify or debug homework solutions will have the lowest priority (but please do ask about how to verify or debug your own solutions!). However, we are happy to discuss conceptual approaches to solving homework problems, which may help to reveal bugs.
Space at office hours can be limited (we may shift to the conference room in 316 Riley-Robb if offices are full and it is available). If the room is crowded and you can find an alternative source of assistance, or if your question is low priority (e.g. debugging) please be kind and make room for others.
Mask Policies
Please stay home and rest if you have symptoms of COVID-19 or any other respiratory illness. No masking will be required, but please be respectful of others who may wear masks or take other precautions to avoid illness. This policy may change if there is another outbreak of COVID-19 (or other illness), but will be kept consistent with broader Cornell mask policies.
Academic Integrity
TL;DR: Don’t cheat, copy, or plagiarize!
This class is designed to encourage collaboration, and students are encouraged to discuss their work with other students. However, I expect students to abide by the Cornell University Code of Academic Integrity in all aspects of this class. All work submitted must represent the students’ own work and understanding, whether individually or as a group (depending on the particulars of the assignment). This includes analyses, code, software runs, and reports. Engineering as a profession relies upon the honesty and integrity of its practitioners (see e.g. the American Society for Civil Engineers’ Code of Ethics).
External Resources
The collaborative environment in this class should not be viewed as an invitation for plagiarism. Plagiarism occurs when a writer intentionally misrepresents another’s words or ideas (including code!) as their own without acknowledging the source. All external resources which are consulted while working on an assignment should be referenced, including other students and faculty with whom the assignment is discussed. You will never be penalized for consulting an external source for help and referencing it, but plagiarism will result in a zero for that assignment as well as the potential for your case to be passed on for additional disciplinary action.
Generative AI/ML Policy
As noted, all work submitted for a grade in this course must reflect your own understanding. The goal of this course is to build critical data-analytic and statistical skills, including evaluations of the output of statistical models. While generative AI/ML models, such as ChatGPT, NotebookLM, or similar, can be used to summarize text and enhance coding efficiency, appropriate usage of these tools requires similar levels of scrutiny, which cannot be obtained without a general understanding of the underlying material. Relying on these tools as a primary means of engaging with the course content and material without independent effort is therefore discouraged. Further, blind reliance on generative AI can result in the submission of so-called “hallucinations” due to the predictive nature of the AI output.
However, the appropriate and thoughtful use of these tools is a legitimate professional skill and reflects an understanding of the content in this course. As a result, the use and consulation of AI/ML tools, such as ChatGPT or similar, is not prohibited, but must be clearly referenced. In addition to citing the use of the tool, you must:
- reference the URL of the service you are using, including the specific date you accessed it;
- provide the exact prompt(s) used to interact with the tool; and
- describe how you (directly or indirectly) integrated the response into your submission.
Failure to fully reference the interaction, as described above, will be treated as plagiarism and referred to the University accordingly.
Late Work Policy
In general, late work can be submitted up to 24 hours after the due date at a 50% penalty. However, sometimes things come up in life. Please reach out ahead of time if you have extenuating circumstances (including University-approved absences or illnesses) which would make it difficult for you to submit your work on time. Note that e.g. job interviews or a busy schedule outside of this course are not valid reasons for extensions. If an extension is granted, any late penalties will be waived up to the extension date. In extreme circumstances, assignments can be forgiven, and your grade will be computed as though those did not occur, giving your other assignments more weight.
Regrade Requests
Regrade requests can be submitted up to one week after the graded work is released on Gradescope.
All regrade requests must include a brief justification for the request or they will not be considered. Good justifications include (but are not limited to): - My answer agrees with the posted solution, but I still lost points. - I lost 4 points for something, but the rubric says it should only be worth 2 points. - You took points off for something, but it’s right here. - My answer is correct, even though it does not match the posted solution; here is an explanation. - There is no explanation for my grade. - I got a perfect score, but my solution has a mistake (you will receive extra credit for this! see below!) - There is a major error in the posted solution; here is an explanation (full credit for everyone, but Prof. Srikrishnan will decide what constitutes a “major error”! see below!).
All regrades will be assessed based only on the submitted work. You cannot get a higher grade by explanation what you meant (either in person or online) or by adding information or reasoning to what is submitted after the fact. The goal of the regrade is to draw attention to a potential grading problem, not to supplement the submission.
Once Prof. Srikrishnan issues a final response to a regrade request, further requests for that submission will be ignored.
While you should submit regrade requests for legitimate errors, using them for fishing expeditions can also result in lost points if Prof. Srikrishnan decide that your initial grade was too lenient or if additional errors are identified.
- If you submit a regrade request correctly reporting that a problem was graded too leniently — that is, that your score was higher than it should be based on the rubric — your score will be increased by the difference. For example, if your original score on a problem was 8/10 and you successfully argue that your score should have been 3/10, your new score will be 13/10.
- If a significant error is discovered in a posted homework solution or in the exam solutions, everyone will in the class will receive full credit for the (sub)problem. Prof. Srikrishnan will decide what is “significant”.
Assessments
Technologies
We will use Canvas as a gradebook, and to distribute PDFs of readings (which also be made available through the website, via the Cornell library). Ed Discussion will be used for course communications. Assignments will be submitted and graded in Gradescope.
Students can use any programming language they like to solve problems, though we will make notebooks and package environments available for Julia (which may help structure your assignments if you use a different language) via GitHub. If students use a language other than Julia, we may limited in the programming assistance we can provide (though we’re happy to try to help!).
We recommend students create a GitHub account and use GitHub to version control and share their code throughout the semester.
Grading
Final grades will be computed based on the following assessment weights:
| Assessment | Weight |
|---|---|
| Participation | 5% |
| Readings | 10% |
| Quizzes | 10% |
| Literature Critique | 15% |
| Homework Assignments | 30% |
| Term Project | 30% |
The following grading scale will be used to convert the numerical weighted average to letter grades:
| Grade | Range |
|---|---|
| A | 93–100 |
| A- | 90–93 |
| B+ | 87–90 |
| B | 83–87 |
| B- | 80–83 |
| C+ | 77–80 |
| C | 73–77 |
| C- | 70–73 |
| D+ | 67–70 |
| D | 63–67 |
| D- | 60–63 |
| F | < 60 |
Participation
Students will be assessed based on their participation across class activities, including lectures and discussions (in-person and online). Routine lack of attendance or engagement will result in the loss of participation points.
Readings
Readings will be assigned for discussion throughout the semester. We will discuss the readings in class the subsequent week after they are assigned. One student will be asked to briefly summarize the reading before the broader discussion. Collaborative annotation assignments will be set up in Canvas to facilitate reading ahead of the discussion. Students in BEE 5850 should also submit a (maximum) one-page summary of the reading, highlighting its key point(s) and their assessment of the effectiveness of the argument, to Gradescope before the Monday after the reading is assigned. Students will be evaluated on both the extent to which their annotations and summaries reflect engagement with the reading and their participation in the class discussion.
Quizzes
Quizzes will be assigned in Gradescope most weeks based on recently discussed content. These will be relatively short and are intended to consolidate material recently discussed in class. The quizzes will be released after Wednesday’s class and will be due prior to next Monday’s. Quizzes can be retaken as many times as are desired prior to the submission deadline. They may consist of multiple choice questions or questions involving small mathematical, computational, or open-ended problems.
Literature Critique
Students will select a peer-reviewed journal article related to an application of data analysis and will write a short discussion paper (2-3 pages) analyzing the hypotheses and statistical choices. Students should feel free to select their own paper or can work with Prof. Srikrishnan to identify one of interest, but in all cases should discuss with Prof. Srikrishnan to ensure that the article is appropriate. The discussion paper will be due towards at the end of the semester.
Homework Assignments
Approximately 6 homework assignments will be assigned throughout the semester (roughly one per course module). Homework problems will generally involve more substantial mathematical or computational work than the quiz problems. You will typically have 2 weeks to work on each assignment, though this depends on the module length. Students are encouraged to collaborate and learn from each other on homework assignments, but each student must submit their own solutions reflecting their understanding of the material. Consulting and referencing external resources and your peers is encouraged (engineering is a collaborative discipline!), but plagiarism is a violation of academic integrity.
Some notes on assignment and grading logistics:
- Homeworks are due by 9:00pm Eastern Time on the designed due date. Your assignment notebook (which include your writeup and codes) should be submitted to Gradescope as a PDF with the answers to each question tagged (a failure to do this will result in deductions).
- A standard rubric is available.
- Students in 5850 will be asked to complete additional homework problems which go more deeply into the underlying concepts or apply more advanced techniques.
- Regrade requests for specific problems must be made within a week of the grading of that assignment.
Term Project
Throughout the semester, students will apply the concepts and methods from class to a data set of their choosing. If a student does not have a data set in mind, we will find one which aligns with their interests.
The term project can be completed individually or in groups of 2. The deliverables are:
- A proposal describing the research question and hypotheses, the data set, and the numerical or statistical models the student would like to use to test the hypotheses;
- A final presentation and report. The report should be no more than 5 pages (11 point font, 1 inch margins), not including references and figures. The presentation should be no more than 10 minutes and will be delivered in-class; presentations may be spread across multiple class periods if the number of projects requires it. More details and rubrics will be provided later in the semester.