CS 0 (Summer 2024) Wikipedia

This project is focused on interacting with String library functions and the Wikipedia API and writing more complicated algorithms and an interactive program.

Goals and Outcomes

In this project, you will build a WikipediaPage class which models and parses a page on Wikipedia. Then, you will use your class to write the WikipediaGame class which interactively implements the Philosophy Game. In principle, your class can be used to extract any information you are interested in from Wikipedia. This project will involve complicated String parsing, interacting with an external API (our Wikipedia API) and writing an interactive program that accepts user input.

Setup

A repository for this project was automatically created for you when you registered for the course on Canvas. Clone that repository.

Implementing basic WikipediaPage functionality

Populating the title and text fields

For the WikipediaPage class, implement the constructor.

public WikipediaPage(String title)

Populates the title field (based on the given input) and the text field. Note: you can just set the text field to null as of now, and you will finish this task in a later section.

Writing a canonical getTitle() method

Our WikipediaPage class should be able to handle various capitalizations of the same page. For example, “pizza party”, “pizza_party”, and “PiZzA pArtY” all refer to the article “Pizza Party”. Our getTitle method should return the canonicalized version of the article name. This means all underscores should be replaced with spaces, and only the first letter of each word should be capitalized. To do this, we will need to familiarize ourselves with Java’s String API.

Replacing Underscores

Implement the replace part of the getTitle method now.

Converting to Title Case

Changing the text to title case is a bit more involved, because Java doesn’t have a built-in method to do it. Since “title case” is a useful idea, we should implement it as a helper method and call it from our getTitle method. The fancy name for this is “procedural decomposition”, and it makes our programs easier to write and read later. This helper method should be static, because it won’t refer to a particular instance of the WikipediaPage class. Additionally, this is not functionality that we will be providing to our user; so, it should have the private modifier to indicate that only we can call it. Implement this helper function now.

private static String convertToTitleCase(String s)

This method should take a string and return its “Title Case” version, which means first letter of every word is capitalized.

Now, call your convertToTitleCase method in your getTitle method, thereby finishing implementing the getTitle method.

Our Wikipedia API

Before you continue building out your WikipediaPage class, you need to know how to interact with our Wikipedia API. We have developed a small Wikipedia API which handles common Wikipedia processing. Our Wikipedia API has the following functionality:

public static String getPageText(String title)

Returns the Wikitext markup of the page represented by title. If no such page exists, returns null.


public static String parseMarkup(String markup)

Returns the value associated with the first Wikitext link or a Wikitext function in the provided markup.


public static boolean isSpecialLink(String link)

Returns true if and only if link is a special link that should be ignored as it does not lead to a real Wikipedia article. This method assumes link is a single link not including the begin and end characters.


Getting the text on a Wikipedia Page

Helper methods for whether a page is valid or a “redirect” page

As with convertToTitleCase above, we want you to implement the following helper functions before you implement getText(). Both of these functions are short and straightforward.

public static boolean isValid(String text)

This method should take the text from a Wikipedia Page (obtained using getPageText) and determine whether it is a valid page (page exists) or not.


public static boolean isRedirect(String text)

This method should take the text from a Wikipedia Page and determine whether it is a “redirect” page or not. Wikipedia Pages are encoded in a markup language called Wikitext. For this most part, this is human-readable English with a few exceptions for keywords and special information like links. One convention in Wikitext is if a page is a “redirect page” for another article, its page text will begin with “#REDIRECT” or “#redirect”. Note: invalid pages are not considered redirect pages.

Getting the text on a page

Now that you’ve written those helper functions, you are ready to get the contents of a Wikipedia Page.

Finishing the constructor

In this task, you will continue work on the constructor, our constructor from before only initializes the title field, and sets the text field to null.
We need to make it so that it populates the text field as well by calling the appropriate method in our Wikipedia API. Make sure you use the original title to get the text rather than the canonicalized version–Wikipedia is very specific about capitalization of articles. After you have got the text for the original link, you need to make sure to follow any redirects (you can use your helper functions for this). The first link on a redirect page will be to the page you should redirect to. You can feed the whole page to Wikipedia.parseMarkup to get this redirect link since it will only contain the one link.

Text Accessor

Now that the constructor populates the text field, we should write an accessor for it. This way a client can use the text of the page in whatever way they please in the case that we don’t provide enough functionality in our class.

Run the Basic WikipediaPage functionality tests on your code. They should all be passing at this point.

The Wikipedia Philosophy Game

Now, it is time to start implementing functionality to work towards the final outcome of this project, an interactive program that implements the Philosophy game. The Philosophy Game refers to the phenomenon where following the first link in the main text of an English Wikipedia article repeatedly leads to the Philosophy article. Historically, this has been true for up to 97% of all Wikipedia articles, although recent changes in some articles have caused this number to vary between 50% and 97%.

Before you write the interactive portion of this program (in the WikipediaGame class), there is one final method in WikipediaPage that will be crucial to this task. The getNext() method returns the next WikipediaPage after following the first link in the main text of a Wikipedia Page. In order to implement this method, there are a few things you need to keep in mind:

We know that this is a lot to keep track of, and thus we have given you tests for getNext() specifically, as well as pseudocode below to help you structure your getNext() method:

Run the Page getNext functionality tests on your code. They should all be passing at this point.

The WikipediaGame class

Finally, you are ready to write the WikipediaGame class. There are 2 methods to write in this class that together implement the Philosophy Game.

private static List<String> runGame(String start)

This method should take the title (in URL format, such as “Messier_63”) of a Wikipedia Page and then follow the first link on it to get the next page, and do so repeatedly until 1) it reaches the Wikipedia page for “Philosophy” 2) reaches a Wikipedia page that has already visited, meaning it is stuck in a cycle that does not include “Philosophy” 3) reaches a page which has no next links 4) reaches an invalid page 5) has visited 100 pages. It returns the list of titles of all the Wikipedia pages visited, regardless of whether the algorithm succeeds or fails to find “Philosophy”.


public static void main(String[] args)

This method accepts textual input from the user in the terminal (consider using Scanner for this) corresponding to the starting Wikipedia page and then calls runGame on it, and prints out either “Unsuccessful search.” or “Found path!” followed by the path of pages the algorithm followed. An example of what an unsuccessful search should look like:

Starting Page: Hello_World
Unsuccessful search.
Hello World -> "hello, World!" Program -> Computer Program -> Sequence -> Mathematics -> Number Theory -> Pure Mathematics -> Mathematics

An example of a successful search would be:

Starting Page: Hello
Found path!
Hello -> Salutation -> Greeting -> Communication -> Information -> Abstraction -> Rule Of Inference -> Philosophy Of Logic -> Philosophy

Note: the algorithm terminated in the first case because it visited an already visited page again and in the second case because it reached Philosophy. Every part of this output was printed by the program except for “Hello_World” or “Hello” which was typed by the user. Make sure to use nextLine() from the Scanner class to get the user input.

Run the WikipediaGame Functionality tests on your code. They should all be passing at this point. Feel free to play around with your program and try different Wikipedia pages. Note that the program isn’t perfect and fails to identify the correct first link in very rare circumstances, because Wikitext is complicated and accounting for all edge cases correctly while parsing it to find the next page would be out of the scope of this project.

Committing and Pushing to the Repository

A “commit” is version control language for a chunk of changes that mean something when grouped together. Importantly, commits will only stay locally on your machine until you push them to the gitlab server.

To commit to the repository, click the green checkmark on the top-right of IntelliJ’s interface. A window will pop up and ask you to type in a message. Choose some meaningful description of your feature, then click the little down arrow next to the “Commit” button on the bottom right of the dialog.
Choose “Commit and Push” in the little menu that pops up. If IntelliJ asks you if you’re sure you want to commit and push, say yes.

Turning in the Project

We have provided a test suite for this project, which you can run by clicking the play button in the top right corner of IntelliJ. PLEASE make sure all of these tests are passing before turning in the project!

To turn in the project, fill out this form to notify the graders that there is a submission!