Goals and Outcomes
In this project, you will build a WikipediaPage
class which models and parses a page on Wikipedia. Then, you will use your class to write the WikipediaGame
class which interactively implements the Philosophy Game. In principle, your class can be used to extract any information you are interested in from Wikipedia. This project will involve complicated String parsing, interacting with an external API (our Wikipedia
API) and writing an interactive program that accepts user input.
Setup
A repository for this project was automatically created for you when you registered for the course on Canvas. Clone that repository.
Implementing basic WikipediaPage
functionality
Populating the title
and text
fields
For the WikipediaPage
class, implement the constructor.
public WikipediaPage(String title)
Populates the title
field (based on the given input) and the text
field. Note: you can just set the text
field to null as of now, and you will finish this task in a later section.
Writing a canonical getTitle() method
Our WikipediaPage
class should be able to handle various capitalizations of the same page. For example, “pizza party
”, “pizza_party
”, and “PiZzA pArtY
” all refer to the article “Pizza Party
”. Our getTitle
method should return the canonicalized version of the article name. This means all underscores should be replaced with spaces, and only the first letter of each word should be capitalized. To do this, we will need to familiarize ourselves with Java’s String API.
Replacing Underscores
Implement the replace part of the getTitle
method now.
Converting to Title Case
Changing the text to title case is a bit more involved, because Java doesn’t have a built-in method to do it. Since “title case” is a useful idea, we should implement it as a helper method and call it from our getTitle
method. The fancy name for this is “procedural decomposition”, and it makes our programs easier to write and read later. This helper method should be static
, because it won’t refer to a particular instance of the WikipediaPage
class. Additionally, this is not functionality that we will be providing to our user; so, it should have the private
modifier to indicate that only we can call it. Implement this helper function now.
private static String convertToTitleCase(String s)
This method should take a string and return its “Title Case” version, which means first letter of every word is capitalized.
Now, call your convertToTitleCase
method in your getTitle
method, thereby finishing implementing the getTitle
method.
Our Wikipedia
API
Before you continue building out your WikipediaPage
class, you need to know how to interact with our Wikipedia
API. We have developed a small Wikipedia API which handles common Wikipedia processing. Our Wikipedia API has the following functionality:
public static String getPageText(String title)
Returns the Wikitext markup of the page represented by title
. If no such page exists, returns null
.
public static String parseMarkup(String markup)
Returns the value associated with the first Wikitext link or a Wikitext function in the provided markup
.
public static boolean isSpecialLink(String link)
Returns true if and only if link
is a special link that should be ignored as it does not lead to a real Wikipedia article. This method assumes link
is a single link not including the begin and end characters.
Getting the text on a Wikipedia Page
Helper methods for whether a page is valid or a “redirect” page
As with convertToTitleCase above, we want you to implement the following helper functions before you implement getText()
. Both of these functions are short and straightforward.
public static boolean isValid(String text)
This method should take the text from a Wikipedia Page (obtained using getPageText
) and determine whether it is a valid page (page exists) or not.
public static boolean isRedirect(String text)
This method should take the text from a Wikipedia Page and determine whether it is a “redirect” page or not. Wikipedia Pages are encoded in a markup language called Wikitext. For this most part, this is human-readable English with a few exceptions for keywords and special information like links. One convention in Wikitext is if a page is a “redirect page” for another article, its page text will begin with “#REDIRECT
” or “#redirect
”. Note: invalid pages are not considered redirect pages.
Getting the text on a page
Now that you’ve written those helper functions, you are ready to get the contents of a Wikipedia Page.
Finishing the constructor
In this task, you will continue work on the constructor, our constructor from before only initializes the title
field, and sets the text
field to null.
We need to make it so that it populates the text
field as well by calling the appropriate method in our Wikipedia
API. Make sure you use the original title
to get the text rather than the canonicalized version–Wikipedia is very specific about capitalization of articles. After you have got the text for the original link, you need to make sure to follow any redirects (you can use your helper functions for this). The first link on a redirect page will be to the page you should redirect to. You can feed the whole page to Wikipedia.parseMarkup
to get this redirect link since it will only contain the one link.
Text Accessor
Now that the constructor populates the text
field, we should write an accessor for it. This way a client can use the text of the page in whatever way
they please in the case that we don’t provide enough functionality in our class.
Run the Basic WikipediaPage
functionality tests on your code. They should all be passing at this point.
The Wikipedia Philosophy Game
Now, it is time to start implementing functionality to work towards the final outcome of this project, an interactive program that implements the Philosophy game. The Philosophy Game refers to the phenomenon where following the first link in the main text of an English Wikipedia article repeatedly leads to the Philosophy article. Historically, this has been true for up to 97% of all Wikipedia articles, although recent changes in some articles have caused this number to vary between 50% and 97%.
Getting the first link on a Wikipedia Page
Before you write the interactive portion of this program (in the WikipediaGame
class), there is one final method in WikipediaPage
that will be crucial to this task. The getNext()
method returns the next WikipediaPage
after following the first link in the main text of a Wikipedia Page. In order to implement this method, there are a few things you need to keep in mind:
- If a
WikipediaPage
isnull
or has no links, this method returnsnull
. - You may be thinking that you can use
parseMarkup
to get the first link on a Wikipedia Page. However, this function returns the first link in the markup of the Wikipedia Page, which is often not the same as the first link in the main text of a Wikipedia Page. Thus, your task here will be to parse Wikitext yourself, and find the first link in the main text of the current page. - Wikitext uses “
{{
” and “}}
” to denote the start and end of certain structures (such as an information box) on a page. The main text of a page is located outside any such structures. - Wikitext uses “
<!--
” and “-->
” to denote the start and end of comments in Wikitext. Any links located inside comments also do not count as part of the main text. - Wikitext uses “
[[
” and “]]
” to denote the start and end of links. However, there may be nested links in Wikitext. In this case, you need to ensure you are getting the outermost link in the markup. - Special links: You can use your
isSpecialLink
method in theWikipedia
API to check if a link you have found is a special link. You should skip any special links you may encounter. - Links with different text than article titles. Wikipedia often uses different text than what the name of the actual wikipedia article is. For example,
[[Philosophy|Philosophical]]
in the markup would appear as the word “Philosophical” in the wikipedia article, but link to the Wikipedia Page for Philosophy. In this case, you will need to get only the first part of the link, and pass that into the constructor of theWikipediaPage
class as the new title for the next page.
We know that this is a lot to keep track of, and thus we have given you tests for getNext()
specifically, as well as pseudocode below to help you structure your getNext()
method:
- If the text is invalid, return
null
. - Initialize counters for each of the three types of delimiters:
{{
and}}
,<!--
and-->
,[[
and]]
- For each character of the text:
- If the character represents an opening delimiter, increment the respective counter.
- If the character represents a closing delimiter, decrement the respective counter.
- If the character represents an open
[[
and we’re not already inside an open[[
, then…- Call the piece of text from this open
[[
to the close]]
the “link”. - If “link” is a special
Wikipedia
link, ignore it. - If “link” has a
|
in it, consider the part of “link” up to the|
to be the actual “link”. - Return the “link”‘ed page as the next page to go to.
- Call the piece of text from this open
- Make sure to move past all characters that were used in this iteration.
- If we didn’t find a link, return
null
.
Run the Page
getNext
functionality tests on your code. They should all be passing at this point.
If you’re failing some tests and you’re not sure why, you can print the pure markup of the Wikipedia Page - to examine both the correct first link and the first link your code is finding - as well as print your counters to figure out why your code is not accounting for the first link correctly.
The WikipediaGame class
Finally, you are ready to write the WikipediaGame class. There are 2 methods to write in this class that together implement the Philosophy Game.
private static List<String> runGame(String start)
This method should take the title (in URL format, such as “Messier_63”) of a Wikipedia Page and then follow the first link on it to get the next page, and do so repeatedly until 1) it reaches the Wikipedia page for “Philosophy” 2) reaches a Wikipedia page that has already visited, meaning it is stuck in a cycle that does not include “Philosophy” 3) reaches a page which has no next links 4) reaches an invalid page 5) has visited 100 pages. It returns the list of titles of all the Wikipedia pages visited, regardless of whether the algorithm succeeds or fails to find “Philosophy”.
public static void main(String[] args)
This method accepts textual input from the user in the terminal (consider using Scanner
for this) corresponding to the starting Wikipedia page and then calls runGame
on it, and prints out either “Unsuccessful search.” or “Found path!” followed by the path of pages the algorithm followed.
An example of what an unsuccessful search should look like:
Starting Page: Hello_World
Unsuccessful search.
Hello World -> "hello, World!" Program -> Computer Program -> Sequence -> Mathematics -> Number Theory -> Pure Mathematics -> Mathematics
An example of a successful search would be:
Starting Page: Hello
Found path!
Hello -> Salutation -> Greeting -> Communication -> Information -> Abstraction -> Rule Of Inference -> Philosophy Of Logic -> Philosophy
Note: the algorithm terminated in the first case because it visited an already visited page again and in the second case because it reached Philosophy. Every part of this output was printed by the program except for “Hello_World” or “Hello” which was typed by the user. Make sure to use nextLine()
from the Scanner
class to get the user input.
Run the WikipediaGame Functionality tests on your code. They should all be passing at this point. Feel free to play around with your program and try different Wikipedia pages. Note that the program isn’t perfect and fails to identify the correct first link in very rare circumstances, because Wikitext is complicated and accounting for all edge cases correctly while parsing it to find the next page would be out of the scope of this project.
Committing and Pushing to the Repository
A “commit” is version control language for a chunk of changes that mean something when grouped together. Importantly, commits will only stay locally on your machine
until you push them to the gitlab
server.
To commit to the repository, click the green checkmark on the top-right of IntelliJ’s interface. A window will pop up and
ask you to type in a message. Choose some meaningful description of your feature, then click the little down arrow next to the “Commit” button on the bottom right of the dialog.
Choose “Commit and Push” in the little menu that pops up. If IntelliJ asks you if you’re sure you want to commit and push, say yes.
Turning in the Project
We have provided a test suite for this project, which you can run by clicking the play button in the top right corner of IntelliJ. PLEASE make sure all of these tests are passing before turning in the project!
To turn in the project, fill out this form to notify the graders that there is a submission!