Git and GitHub in CS559

In CS559 this semester, we will use GitHub as a mechanism for distributing framework code (the starter code for assignments) as well as to have students hand in their assignments.

If you’ve never used Git, some of this might not make sense yet, since we are using Git terminology. See Learning Git and GitHub below for help in getting started.

Learning enough about Git is a requirement for the class. Technically, knowing the basics of Git is a pre-requisite (since it is covered in CS400). However, we will help you learn.

Overview / Summary

  1. Every student must register for a GitHub account at https://github.com. You can use one that you already have.
  2. You should learn the basics of using Git. Know the concepts of a repository, cloning, committing, pulling and pushing. For class you won’t be required to use any of the fancier features (although you might want to).
  3. We will require all students to specify what their GitHub account is.
  4. The course will use GitHub Classroom that will create private repositories for each student.
  5. Students are required to clone their private repository on GitHub to start each assignment. They must commit their work (including adding any files that they need to add). We recommend that students commit their work often, so there is a record of progress. To turn things in, students must push things back to their private repository on GitHub.

How things will work

The basic idea is:

  1. As part of an assignment description on Canvas, we will provide a GitHub classroom link. Each student will access the link. When they do, a private GitHub repository containing the assignment “starter” will be created for them. The students will associate this repository with their WiscID (there is no official connection - it’s just so we can keep track). Each student {{/* (or project team, for group projects) */}} will get their own private repository.
  2. Each student clones their repository to their computer as the starting point for their assignment.
  3. Each student works on their assignment, using their repository as they want. They commit different versions, create clones, push and pull, etc.
  4. When the student completes their assignment, they push to the classroom repository. The version they wish to have graded should be the “master” branch.
  5. The course staff (TA/Professor) will clone the student repository in order to grade it. This will happen shortly after the due date. If you push new versions after this cloning is done, we will not see it.

GitHub classroom will make private repositories for each student for each assignment. These are repositories that only the student and course staff can access.

If you use advanced Git features, you need to have things such that the grader can find the right version to grade. The grader will grade the default (origin/master) branch.

How GitHub Classroom Works

Understanding how GitHub classroom works will make everything else easier.

GitHub classroom provides a thin layer over regular GitHub to automate the creation and retrieval of a large number of similar repositories.

For each assignment, GitHub Classroom will make a special URL link. When you go to this link, it will create a private repository for you. The repository will have the name “AssignmentName-GitHubID”. This repository is a clone of the starter repository. The repository is a private repository owned by the course staff (so we can view it) that the student will have write access to it.

GitHub classroom creates repositories for any GitHub user ID that requests one. It does not necessarily associate anything with student ID or person. We give it a list of IDs - but all it does with this is try to ask for associations when the repos are created, and provide us with a roster list. Ultimately, we need some mechanism for connecting GitHub IDs with students.

The student uses this created repository to create their assignment. Generally, this will involve cloning the repository to the computer you work on (e.g., your laptop), editing the files and adding new ones, committing the changes (to your local repository), and pushing these changes back to the repository Classroom created.

Class Policies

  1. All students must have a GitHub account.
  2. Each student needs to use their own GitHub account.
  3. You should use the same GitHub account for the entire class. If you need to change your GitHub ID for some reason, contact the course staff.
  4. Students must provide information so we know how to connect them with their GitHub account.

Some Warnings

The links we will provide for assignments will create new repositories for any GitHub user that wants to create a repository. Please do not share the link with anyone outside of class.

There is no checking to make sure that the correct repositories are connected to the correct student IDs. Be careful when you set your repository up to select the correct student account.

It is your responsibility to make sure your repository is correctly associated with your student ID (so we know who you are). If there is any problem, please talk to the course staff. Basically, GitHub uses your GitHub ID for everything, and we need to keep a list that associates GitHub IDs to students.

You need to make sure you commit all your code into the repository - including adding new files that you make.

You need to make sure that you push all of your assignment into your repository. We recommend committing and pushing regularly since the repository can serve as a backup in case something bad happens to your local copy.

It is your responsibility to check that your repository is correct for hand-in. You can do this by making a fresh clone of it - you’ll see exactly what the grader will see.

The repositories may be removed at the end of the semester. If you want to keep a copy of your work, make sure that you clone the repository someplace other than the repository we give you.

We recommend that you set up SSH authentication to access GitHub from the computer you will work on. See Git setup and SSH configuration.

Learning Git and GitHub

The basic ideas of using Git should be a review - since you should have learned about it in CS400 (or elsewhere). However, you may want to look through this tutorial to get some specific ideas on how to use Git specifically in CS559, and for some practical pointers.

If you’re an experienced Git user, nothing here should be a surprise.

If you haven’t used Git before, it does take a little getting used to. Git has a lot of cool features - but for class, we will only need the basics.

The brief tutorial below can help you get started, but you might want to check out other resources. The Git - The Simple Guide has been recommended as an easy way to get started. The book “Pro Git” is available online - the whole thing is overkill, but the first chapters will give you a good idea of the most important features of Git.

A Brief Tutorial

Git vs. GitHub - Git is a source code control system. GitHub is a company that provides a server where people can have accounts and store stuff (repositories). GitHub (the server) is often used to share code publicly. However, our class account has the ability to make private repositories (which is why we make the repos for you).

Basic Idea: When you work on a project (even without source control), everything is in a working directory. The working directory (WD) might have subdirectories. Some of the files in the working directory will be tracked - that means that we care about them and will want to keep copies.

The basic idea of source control (including Git) is that we’ll keep making multiple backups of our working directory (or at least the tracked files). If something goes wrong, we can always restore one of those backups. Source control keeps all backups (we call them versions).

The key concept for Git is a repository (or “repo” for short). It’s basically the collection of all the backups of your project directory (or, at least the files that you told it to back up). In addition to the current version, you can keep lots of past versions. If you’re wondering if its wasteful to keep lots of versions, don’t worry - Git is very efficient.

What makes Git different than most other source control systems (except for Mercurial which works the same way) is that we will copy around the whole repository. When you give someone your project (the repo) they will copy the whole repo - which means they get all the past versions.

So, with Git, there can be multiple copies of the repository. You might have one on your laptop, another copy of the repo on a desktop computer, and yet another copy stored on GitHub. This is good because if your laptop breaks, there are backups. It also allows a team to work together: everyone has their own copy of the repo.

The tricky part is that if you work on one copy of the repository, it will have different versions than the other copies. So you will periodically need to synchronize so that different repos have the same contents (versions of the project). The two git commands for this are pull which gets the changes from another repository and push which sends the information in your repository to the remote one.

The basic Git operations are:

  • clone - which copies a repository (e.g., from GitHub to your laptop)
  • add - which tells Git that you want to track a file (i.e., that when you make a backup, it should include this file)
  • commit - which is the “backup” command - it stores a version of all of the files. When you commit, you need to “stage” (which basically tells commit which files you want to make a backup of)
  • checkout - which is the restore command - it switches back to a previous version (e.g., that you committed)
  • push - which puts information from your repository into a remote (to synchronize them)
  • pull - which gets information from a remote repository into yours

There are some subtleties and fancier features, but this is enough to get started. So, for class the basic workflow is:

  • We provide you with a GitHub classroom link. When you follow this link, GitHub classroom will make a GitHub repo for you.
  • You clone this repo to your laptop
  • You do some work
  • You commit the changes you made to the project
  • You do some more work
  • You commit the changes you made to the project
  • You add files if you created new ones in the project
  • You push the changes back to your remote repo on GitHub
  • You do some more work
  • You commit your work
  • You push your changes to the GitHub Repo
  • repeat until done
  • The grader / TA / instructor clones your repo from GitHub so they have a copy to look at/grade.

Note that this is a simplified workflow, assuming no one else is using your repo, and you only are working on one computer. If you need to coordinate with multiple people or work on multiple computers (each with their own repo), you need to use more Git commands.

GitHub security

When you log into your account on GitHub using your web browser, you use a password. Web browsers have protocols to send passwords in secure ways. If you really care about security, you can even use two factor authentication.

The problem is when some other program (not your web browser) - like the GIT command line client - needs to access your repositories on GitHub.

Approach one is to have the program pretend it’s a web browser and send your password (they way a web browser does). This has the downsides that you need to type your password all the time and it doesn’t work with two factor authentication. And there are some other reasons why security folks dislike it.

Note: if you are referring to a repository using a web URL, like https://github.com/cs559-sp20/wb01-gleicher.git, you are using the “https protocol” (the password version of authentication).

The preferred approach is using an SSH public/private key pair. The public key is something that you give to GitHub. The private key is something that you keep on your computer. When your computer needs to talk to GitHub (using a program like the GIT command line), GitHub checks that your private key works with the public key you have given it already.

Note: links to repositories using SSH look like: git@github.com:cs559-sp20/wb01-gleicher.git (they don’t look like web URLs).

Getting SSH to work with GIT takes a little effort - but it is worth it. It makes everything a lot simpler. You need to do it once, preferably at the beginning of the semester.

Some Git Hints

GUI tools (we recommend SourceTree or GitHub Desktop) can be convenient. See the tools page for ideas. You could also use the Git tools inside of Visual Studio Code (see Visual Studio Code (VSCode) for CS559). However, you should have (and probably be able to use) the command line tools.

It is important to set up SSH authentication (see Git setup and SSH configuration).

You should commit often. Saving your work periodically lets you go back to see what you did, and gives a trail of your progress.

When you commit, leave a meaningful comment - that way you’ll know which commit is which, so you will know which one to go back to if you need to.

Don’t forget to add new files when you add new files to the project.

If you push to GitHub periodically, you’ll have a backup in case some disaster happens to your laptop.

Don’t use fancier Git features unless you know what you’re doing. They are cool, but they are mainly useful when you are working on a team.

Why are we doing this?

Using Git / GitHub / GitHub classroom is a useful addition to class for a number of reasons:

  • It encourages students to use source control for their projects, which is important. The projects are big enough, that you don’t want to lose your work by accidentally making a mistake or something.
  • It provides students with experience working with source control (and Git specifically), which is a useful skill for the real world.
  • It provides a uniform mechanism for handing in complex projects with complex directory structures.
  • It provides a mechanism to share intermediate versions with course staff so that we can help students.
  • It provides a mechanism for students to share their development history in the event there is questions of the provenance of their assignments.
  • It provides a mechanism so that students can check that they have turned in the correct files. They can clone the repository (just as the course staff will) to see what the course staff will see.
  • It provides a mechanism for us to distribute the starter code to everyone.
  • It provides a mechanism for us to share updates to the starter code if we need to.
  • By forcing people to start with the starter code, we can enforce naming conventions. This will make it easier for students to follow the project rules.
  • By forcing students to start with the starter code, we discourage people from starting by copying someone else’s assignment.
  • It provides a mechanism for time stamps - if something goes wrong and a file isn’t handed in correctly, we can look at its history in the repository.