The Social Structures of Harry Potter

Introduction

The focus of this project was to to take a closer look at the Harry potter universe using social networks, sentiment analysis and word clouds. The Harry Potter series is one of the most popular book and movies series in recent years and our goal was to use those tools to gain new insights into known and loved material.

Several networks have been created to analyze the social structure of the Harry Potter universe. A networks that maps out the relationships between most known characters that have lived throughout the different eras ot the universe, was created based on the Harry Potter Fandom wiki. Not only did the Wiki contain information about the relation ships between the characters, it also provided some basic information such as the house, blood status and loyalties of each character. In addition the Wiki network each movies were also turned into a network based on the character's appearances in the same scenes, with the intention to capture the development of the social structure over time.

In addition to the social structures we also extracted the essence of each book to illustrate the major plot lines of the books. Finally, the books and movies were analyzed for their sentiment to show the mood of the series as if progresses towards the climax of the final battle between good and evil.

Wiki Network

This section describes the network that was created from the fan Wiki to map out The relationships between all characters that have lived throughout the different eras in the Harry Potter universe.

The network was constructed by going through the pages of the wiki and for each page that belonged to a character, the links to all other characters where collected. The result was directed graph where each node is a character with edges to all other characters that where referenced on the characters page.

The Wiki dump contains 15728 pages, but not all of the are dedicated for characters. As there are no obvious identifiers for character pages a list of (hopefully) all known characters was collected in order to identify the those pages that belong to characters. After cleaning the list to ensure that the family pages weren't included and resolving some minor name miss matches for a few characters the resulting graph contained 994 nodes and 7598 Edges

Information about the characters like the house, blood status and loyalties that were extracted from the Wiki were used to color the nodes of the network and the degree of the nodes where used to scale the size of the nodes to provide additional information

when coloring the nodes by house membership, the first things that catches the eye is the red and green center of the network meaning that the most important nodes belong to either Gryffindor or Slytherin. While the houses form some small clusters, they aren't strictly separated and have a fair amount of mixing. While Gryffindors tend to be closer to the center of the class, the other house tend to be more towards the outer layer also Slytherins a somewhat separated from Ravenclaws and Hufflepuffs which a placed on the opposite side of the outer layers. It also a small fraction of the known characters that was actively part of the battle as the majority of the nodes are not explicitly part of any of those fractions,

For the next graph, the coloring the nodes corresponds by good and evil. A character is defined as good if part of either the order of the phoenix or Dumbledore's army while a character is considered evil if they are loyal to the Death Eaters or Lord Voldemort. While there is large pure cluster of good characters on the west side of the network the two groups aren't very well separated meaning that there is a fair bit of referencing across the two fractions.

One thing that becomes apparent from those two graphs is the tendency that house and fraction (good and evil) membership is not known or irrelevant for a fair portion of the network, which suggests that there is a lot more the the universe than the battle against Voldemort.

Degree Distribution

The graphs below illustrate the degree distribution of the characters. It shows that most of the characters have a very low degree with only a few highly connected characters. The shape of the log transformed degree distribution indicates the presence of a power law which is typical for a scale-free network such as social networks. It also appears that there is a linear correlation of the number of in and out-going connections.

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Degree Distributions with Log Scale

distribution

Most Connected Nodes

To further analyse the network we look into the degrees and centralities of the nodes. This provide some information about the most central characters to the Harry Potter universe and hopefully reveal som unexpected characters. The following table show the most central characters according to various centrality measures.

degree in degree centrality out degree centrality eigenvector centrality betweenness centrality
harry potter 471 harry potter 0.3142 harry potter 0.1601 harry potter 0.2538 harry potter 0.0870
tom riddle 341 tom riddle 0.2246 ronald weasley 0.1289 tom riddle 0.2364 albus dumbledore 0.0613
albus dumbledore 309 albus dumbledore 0.1833 albus dumbledore 0.1279 ronald weasley 0.2174 tom riddle 0.0597
ronald weasley 308 ronald weasley 0.1813 tom riddle 0.1188 hermione granger 0.2157 gellert grindelwald 0.0323
hermione granger 295 hermione granger 0.1793 hermione granger 0.1178 albus dumbledore 0.2061 ronald weasley 0.0322
ginevra weasley 204 ginevra weasley 0.1168 ginevra weasley 0.0886 severus snape 0.1689 hermione granger 0.0278
severus snape 183 sirius black 0.1128 severus snape 0.0765 draco malfoy 0.1639 ginevra weasley 0.0241
sirius black 175 severus snape 0.1078 rubeus hagrid 0.0745 sirius black 0.1639 rita skeeter 0.0203
draco malfoy 168 draco malfoy 0.1037 dolores umbridge 0.0715 ginevra weasley 0.1601 seraphina picquery 0.0150
rubeus hagrid 168 dolores umbridge 0.0947 george weasley 0.0715 neville longbottom 0.1533 arthur weasley 0.0130
dolores umbridge 165 rubeus hagrid 0.0947 fred weasley 0.0665 dolores umbridge 0.1431 lucius malfoy 0.0119
arthur weasley 156 arthur weasley 0.0937 neville longbottom 0.0665 rubeus hagrid 0.1410 phineas nigellus black 0.0117
neville longbottom 153 neville longbottom 0.0876 draco malfoy 0.0655 arthur weasley 0.1328 gilderoy lockhart 0.0110
minerva mcgonagall 139 minerva mcgonagall 0.0765 sirius black 0.0634 molly weasley 0.1300 horace slughorn 0.0109
luna lovegood 132 horace slughorn 0.0735 minerva mcgonagall 0.0634 minerva mcgonagall 0.1293 dolores umbridge 0.0108
molly weasley 132 bellatrix lestrange 0.0725 arthur weasley 0.0634 bellatrix lestrange 0.1280 rubeus hagrid 0.0107
bellatrix lestrange 124 luna lovegood 0.0705 molly weasley 0.0634 remus lupin 0.1265 draco malfoy 0.0106
fred weasley 119 molly weasley 0.0695 luna lovegood 0.0624 luna lovegood 0.1252 gwenog jones 0.0104
lucius malfoy 118 lucius malfoy 0.0645 dean thomas 0.0564 percy weasley 0.1110 sirius black 0.0100
george weasley 118 remus lupin 0.0624 lucius malfoy 0.0544 lucius malfoy 0.1104 lucas picquery 0.0099

Degree

The degree analysis is concerned with the total number of edges connected of a node, thus indicating how connected a node is.

As expected Harry Potter is by far the most connected character in this analysis, being the main protagonist he is the focus point in the story. The series is, for the majority, whitten in first person, and even with the supplementary canon informations released through Pottermore, the Fantastic Beasts series and J.K. Rowling's personal Twitter account, the information is mainly is giving through Harry’s perspective. To our surprise the following two most connected nodes are Voldemort (Birth Name: Tom Riddle) and Albus Dumbledore. This is of course because of their ties to everything happening in the Wizarding universe during the Books series’ timespan. Ron Weasley and Hermione Granger then follow. A surprising trend which will be seen through this analysis, is that Ron is more connected than Hermione. This in itself makes sense as he a child of the Wizarding world, with a highly connected family, but a common conception of the story is Hermione being vital to the success of Harry’s adventures. And while this analysis does not prove the truth of this conception, it does show that her connections in the story may be affected by her blood connections within the Wizarding world.

In/out Degree Centrality

In and out degree centrality lists the nodes ingoing edges (in degree centrality) and outgoing edges (out degree centrality).

While the in degree centrality list mirrors that of the degree’s overall, the out degree centrality supports the discussion of Ron being more connected due to his wizarding family. He has a large family (even for the wizarding world) with embers in many of the stories communities, such as the ministry of magic, the Order of the Phoenix and the triwizard tournament. Due to Dumbledore's wizarding lineage and accomplishments he also rates high on this list, while Voldemort grew up in a muggle orphanage before attending Hogwarts and discovering his wizarding powers.

Eigenvector centrality

How much influence a nodes has on the network is estimated through eigenvector centrality.

We may have expected Dumbledore to top this list, but yet again we find Harry there. This is likely due to many of Dumbledore’s influential endeavours not being specified besides the acknowledgement of them being vast. When studying the list it is clear that the most influential characters are those who have largely influences harry and his journey, and not the overall story.

Betweenness centrality

A central node has the most information passing through is, and is discovered through the betweenness centrality.

The surprising character in this list is Gellert Grindelwald who is a minor character to the Harry Potter universe, but central to Dumbledore’s story. As his and Dumbledore’s youth adventures dived into the story and existence of the deathly Hallows, this is most likely the reason for this betweenness in the network. Horace Slughorn, who is known for his networking with promising Students at Hogwarts and whom he expects to grow into influential positions in the wizarding world, has both a high in degree and betweenness centrality, but isn't present for the remaining measures. This suggest that Slughorn indeed has quite a good eye for promising students that fulfill their potential. An other unexpected character on the list is Gwenog Jones who a Famous Quidditch player in the wizard world, but just a very minor character in the books and movies. This suggests that quite a few characters actively follow the Quidditch scene, the wizard worlds most popular sport.

Communities

Community0
Community1
Community2
Community3
Community4
Community5
Community6
Community7
Community8
Community9
Community10
Community11
Community12
Community13
Community14
Community15
Community16
Community17
Community18
Community19

The network contains a dozen of smaller cliques, containing very minor characters, who are connected to the center through only one link. They look like branches with a few buds on the end and most of them ended up in their own small communities. As for the communities in the center of the network, it appears that the families seem to be the best predictor for community membership, at least for the nodes with the highest degrees in each community. They also contain some lesser known characters where the relation of the major families in the node isn't that obvious. As a result of strong influence of family membership the three main characters aren't in the same community, which comes as a surprise.

If you want to take a closer look, here's a link to the detailed list of all wiki communities.

Movie Networks

The goal of the wiki network was to get an impression of the composition and dimensions of the whole harry potter universe. We will now take a closer look at the movie scripts as they cover the universe's most central and well documented story lines and will see to which degree the structures encountered in the movies align with the findings from the fan wiki.

The networks have been constructed by going through the movies scene by scene, tokenize the scenes into words and then looked up the tokens in a dictionary that contained all first and last- and middle-names that could be used to identify unique characters. The dictionary was constructed based on the before mentioned list of characters and also listed in the data sets. In some cases characters could be identified by only part of their name such as Mr. or Mrs. Weasly. Those characters where identified separably by full string matches within the scenes and required some deeper knowledge of the scripts. Edges where added between all characters that where mentioned within the same scene. The resulting networks are undirected in contrast to the Wiki network.

Like in the Wiki network, the size of the nodes is scaled according to the characters degree. Some minor characters that appear in only one but fairly large scene with many characters, have inflated degree as a consequence. To account for the the size of the nodes is also weighted by the number of scenes that a characters appears in. The nodes are mainly colored according to their house membership based on character information that was also used in the Wiki network. If no information about the house membership is unavailable, but the blood status is known, the characters a either marked as muggles or magical beeings of some sort.

Before we take a closer look at the networks of the each movie, let's look at the temporal aspect of the networks see how the networks develop over time.

Network Dimensions

The first obvious difference between the movie and wiki networks the difference in size. While the Wiki network has just a little under a 1000 nodes, the movie networks are far smaller with something between roughly one-twentieth to one-tenth of the number of nodes. The amount of characters in the movie dips towards the middle of the series and the increases towards the end. The number of nodes and edges seem to correlate linearly except for the last movie where the number of edges is noticeably higher compared to the other movies.

It shows that even though though there is a large variety of characters in the Harry Potter universe, only s small fraction was given actual dialog or mentioned in the sanes either trough the dialog or in stage directions.

Movie Network size over Time

degreesOverTime

The high number of nodes in Deathly Hallows Part 1, is likely due to the requirement to connect all the loose ends in the story. The book with the minimum number of nodes and edges is Goblet of Fire, and is likely due to the very centered storyline around the Triwizard Tournament. This tournament keeps the story focused on the school and the people present, and this changes as the later book introduce outside groups such as the Order of the Phoenix.

Degrees Over Time

One advantage of looking at the movies one at a time and turning them into networks is that it allows to get an insight of the characters importance throughout the years. The movies follow Harry and his companions from his first year at Hogwarts to the final Battle against Lord Voldemort. Now we can see when his closest companions or foes enter and leave the stage throughout the series.

The graphs below include each character that was among the top 20 connected nodes in at least one movie and shows for how many movies the character was among the top 20 connected and how high the degree was. The goal is to illustrate which characters are most important for each movie, but also for the whole Series. The three main characters Harry Potter, Ron Weasley and Hermione Granger are the most relevant characters according to this measure which comes to no surprise. Many known characters appear in the top of the list, but it also captures many minor characters that played a key role in only one movie like the participants of the Triwizard Tournament like Victor Krum and Cedric Diggory, students who were petrified by the Basilisk like Colin Crevey, Professors in defenses against the dark arts like Quirrel, Lockhardt, Remus Lupin and Mad-Eye Moody, or members of Dumbledore's army and the order of the phoenix who become more relevant towards the end of the series like Luna Lovegood and Tonks. Likewise enemies like Bellatrix Lestrange and Fenrir Grayback appear more towards the end after Vlodemort has been resurrected and The fight against him moves towards the climax.

The following sections take a closer look at the networks and analyze the naturally occurring communities that have been detected trough the Louvain community detection algorithm.

degrees
degrees
degrees

Philosopher's Stone

Movie Network

The network for Philosopher’s Stone illustrated the expected high involvement characters such as the professors, students and Dursleys. The Gryffindors are highly represented as would be expected, hardly matched by the Slytherins Snape and Draco Malfoy.

The three major characters Harry, Ron and Hermione, closely followed by the characters which are early introduction in the movie; Dumbledore, McGonagall, Hagrid and so on. The network is quite small compared to the later movies, but this is reflected in the necessity to introduce the whole universe in a book, and too many characters may confuse the viewer. Its is from the beginning clear that certain characters have been cut from the movie to achieve this. Peeves the poltergeist is an example of one of these cut characters which are magical in the story but may be superficial in the grand scheme of things. It is however noteworthy that many of these “simple” cuts and changes have affected the later movies greatly as certain plotlines have needed modifying to make sense to the audience.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Degree Distribution

A trend, that also roughly applies to the degree distributions of all movies, shows that most nodes have a degree in the rage from 8-15 that look roughly normal distributed around that range. Only a few characters have degrees in the range of 40-50. Those nodes are the main characters which is also reflected in the graphs from the section Degrees Over Time. It also seems that there aren't any prevalent power laws present in the networks in contrast to the wiki network.

The only network that has a distribution that to resembles a power law the closest is the network for the Deathly Hallows I, which is also the largest network containing the most characters, which that this effect possibly could be explained by the small network sized alone.

Communities

Three communities are identified in the first movie. Community 0 involves Harry’s family, his parents, aunt, uncle and cousin. Community 1 and 2 are hard to separate from each other, as none of them are unique. They both contain characters which are integral to the mission of the book, as well as side characters which are not. At first glance, Community 2 seems to contain nodes with a larger connection to Harry and the mission of the movie, with the exception a few tiny nodes, while few nodes from Community 1 are also connected to the mission. The distribution of known groups in the communities tries to help suggest connections between the characters.

One thing that is in common for all movie networks it the grouping of the three main characters into the same community which was not the case for the movie network. This is likely due to the fact that the three characters appear in many scenes together. And while the communities may to some degree overlap with the natural occurring cliques, they depend a lot more on interactions at a specific time compared interactions over a life time.

Community 0
Community 1
Community 2

Distribution of known groups in the communities

Chamber of Secrets

Movie Network

Compared to the first movie, the trio is more prominent here. The significant size different in the nodes, indicated that we are focusing more on the characters, while the number of nodes are also increasing, with the introduction of new characters. McGonagall and Lockhart are the two prominent professors in this book, with Lockhart being a new character. A character which was never introduced in the series is the Librarian Mrs. Pince. Another new character is Colin Creevey who is quite relevant to the storyline, being one of the petrified students. He is amon one of the larger nodes alongside Ginny, Dumbledore and Voldemort. Another stand out node is Justin Finch-Fletchley a muggle-born Hufflepuff who mistakenly rejects Harry’s attempted friendship when he mistakes him for the Heir of Slytherin.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

A total of five communities are identified in the second movie, indicating a more ambitious storyline and cast. Community 0 contains the absolute integral characters to the story, alongside a few minor, this continue the trend from the first movie, where unlikely characters are included in communities. Community 1 appears to revolve around the story of the Chamber of Secrets and Tom Riddle’s diary. Community 2 involves the nodes connected to the events in the beginning of the movie, at his Uncle’s place. Community 3 loosely covers the characters affected or connected to the basilisks petrification victims. Community 4 contains only, with the exception of one node, Quidditch players from the Gryffindor and Slytherin teams. Community 5 consists of primarily Gryffindor students.

Community 0
Community 1
Community 2
Community 3
Community 4
Community 5

Distribution of known groups in the communities

Prisoner of Azkaban

Movie Network

The trio is in the third movie around the same node size as Sirius Black, closely followed by Dumbledore, Wormtail (Peter Pettigrew), Hagrid and Neville. As the story revolves around the escape and manhunt of Black, it is expected to have such a big node. The Slytherins besides Snape are not the main rivalry in this this movie, and this is mirrored in the network. Voldemort is not directly a prominent villain in the story as seen, however from the story we know that his venomosity is, for the better part of the movie, mirrored in Black. Lupin is an unexpected result, as the varying Defence Against the Dark Arts teachers tend to have a large impact on the story and seen with Lockhart. However, his presence is largely felt.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

Community 0 contains characters connected to the main storyline, such as the Marauders and Buckbeak. Community 1 contains the Weasley family and Voldemort, however the connection between them is not readily detectable. Community 2 involves the events at Harry’s Aunt and Uncle’s house, where he blows up his uncle’s sister. Community 3 is Hagrid and his trail for Buckbeak's life. Community 4 focuses on the nodes related to the everyday happening at Hogwarts, constructed of Gryffindor students and teachers.

Community 0
Community 1
Community 2
Community 3
Community 4

Distribution of known groups in the communities

Goblet of Fire

Movie Network

In the Goblet of Fire the first significant difference seen in the sizes of hermione and ron’s node sizes compared to Dumbledore, who surpasses them. Mad Eye Moody follows tightly alongside Cedric Diggory. Though we see the final return of the Dark Lord, Voldemort is still comparatively small compared to other key characters. The three other triwizard competitors are both new and highly mentioned, as illustrated in the network, however considering the fascination with Fleur her node is not as large as fx. Krum’s. Draco Malfoy and Snape are both Harry’s primary antagonists, in this movie however this is not not the case, as focus is elsewhere.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

Community 0 is related to Voldemort’s first reign of power, the people who supported him and those affected by his tyrony. Community 2 is not very specific, but overall contains all nodes relevant to the overall main story. It can be concluded that Community 1 is the main characters connected to the storyline of the triwizard tournament and new defence against the dark arts teacher, besides the trio. In Community 3 there is not apparent relation between the nodes, besides that they all fight on Harry’s side of the war and the nodes are not that important to the movies story, which differs from the book and therefore hints at the differences between the two.

Community 0
Community 1
Community 2
Community 3

Distribution of known groups in the communities

Order of the Phoenix

Movie Network

As seen in Goblet of Fire, Order of the Phoenix have Dumbledore rivaling the appearance of Ron and Hermione. Likely due to the fact of him being mentioned more often. In this movie we are introduced to Professor Umbridge who is a delegate from the ministry of magic and has enormous impact on Hogwarts and Harry. Voldemort is also more prominent, as Harry tried to convince the world of his resurrection.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

In Community 0 all the nodes work for the ministry of magic, but appear throughout the movie in different relations to Harry. Community 1 is the community of Dumbledore's Army (DA), which Harry starts as a studygroup to teach themselves what Umbridge will not. Bellatrix and Cedric are in this community, as Neville and Cho respectively mention them during their DA meetings. Community 2 involves the characters connected to Nagini's attack on Mr. Weasley at the ministry of magic. Community 3 concerns the characters involved in the storyline taking place at Harry's Aunt and Uncles place. Community 4 concerns the rivalry between the twins and Filch which is legendary in the books.

Community 0
Community 1
Community 2
Community 3
Community 4

Distribution of known groups in the communities

Half Blood Prince

Movie Network

In this movie we see Ginny emerge as a large node, this is because of her growing relationship with Harry. We once again see Harry two antagonists grow in node size alongside Voldemort and Slughorn. Slughorn is here introduced as the new teacher of Poisons, and reintroduces Hogwarts to his Slugclub. The movie is largely dominated by Gryffindors and a small group of Slytherins.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

In Community 0 characters are from the Slugclub, founded by Professor Slughorn to stay favorable with students with a promising future, and Slytherins. Community 1 groups Death Eaters, Order members and Diagonally shop owners, all of which fight on either side in the war between good and evil, which occurs in the wizarding world outside Hogwarts’ walls. Community 3 includes nodes connected to the school, the main nodes being Dumbledore, Slughorn and Voldemort, and the those related to them. Community 2 is the nodes connected to Harry's storyline.

Community 0
Community 1
Community 2
Community 3

Distribution of known groups in the communities

Deadly Hallows

Movie Network - Deathly Hallows Part I

The golden trio is dominating again, which is to be expected as a large portion of the movie documents their solitary travels in a magical tent, hunting Horcruxes. Dumbledore and Voldemort rivals about the attention as the later is an acute threat and the others reputation gets dragged through the mud. The other large players consists largely of member s of the Order of the Phoenix. With the largest cast, it is clear that the movie is setting up the final battle while attempting to tie loose ends.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities - Deathly Hallow's part I

Community 0 represents the members of the Order of the Phoenix. In Community 1 the nodes are all involved in the break in at the ministry of magic. Community 2 illustrates the overlap in Dumbledore's story and the quest for the Horcruxes, the Deathly Hallows, and the trip to Luna's house. Community 3 follows Voldemort’s storyline throughout the movie, from the murder of professor Burbage to the kidnapping of Ollivander. In Community 4 we find mainly minor characters such as the Dursleys and low tier Voldemort followers. Community 5 contains nodes of students at Hogwarts.

Community 0
Community 1
Community 2
Community 3
Community 4
Community 5

Distribution of known groups in the communities

Movie Network - Deathly Hallows Part II

Ginny, Neville and Seamus are stepping up at Hogwarts in the absence of the trio, alongside Luna and Dean. This shows the importance and power of the DA (Dumbledore's Army). As expected Voldemort plays a huge part in this movie, alongside his trusted Death Eaters. We can also glimpse the names of the main characters offspring.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities - Deathly Hallows Part II

Community 0 are the students, professors and Order members involved in the battle of Hogwarts. Community 1 illustrates the quest for the Horcruxes and Harry’s death. Community 2 is interesting as the common denominator between the nodes is the characters goal to protect or raise harry. Community 3 contains the descendants of the Harry and Ginny, Ron and Hermione, and Tonks and Lupin.

Community 0
Community 1
Community 2
Community 3

Distribution of known groups in the communities

Word Clouds

The word clouds show the most frequent words of each book that also are most unique for to that book compared to the whole series. This is achieved by using TF-IDF which diminishes the weight of frequent words based on the amount of other books they appears in.

As an example, the first word cloud from Philosopher’s Stone, words like “Fluffy” and “Flamel” show as big letters, which means that these words are frequent and most unique to that particular book. It provides a much better overview of the content of each book compared to a word cloud that is generated for just the plain book text as all important characters would take up most of the space for each book which wouldn’t be very interesting.

It requires some knowledge of the books, but the TF-IDF works remarkably well to summarize the content of the books. It works best at emphasizing the characters that are most crucial in just one or two books. Those names do a good job at reminding readers about the plot lines in which the played an important role. Apart character names it also finds quite a few objects that are important in a few books like Horcrux, necklace and map. Most Importantly the result is actually interesting to explore because it is not the same words over and over again in different sizes.

Sentiment Analysis

Overall

By comparing the sentiments of movies and books, we get a more or less look into the one-dimensional mood-scale, considering happy verses sad. But the values found vary so little that there is not much to be concluded, though the overall graphs can be justified with the events of the books. The dataset used to calculate the sentiment, is not representative of the Harry Potter universe, due to its unique world. Some strongly implicative words such as Dark Lord, Death Eaters and Muggle have huge meanings in the wizarding world, where they also highlight the importance of neighbouring words, to the single words understanding. In any of the sentiment analysis, the variation is so tiny that any form of analysis is borderline speculative, however we still intent to try interpreting it as we have found the results to fit very well with the material.

Book

This sentiment is calculated from the dialog from each book. This was found to be the best indicator, through testing, as a book contains large amount of descriptive text.

The graph nicely illustrates how we experience the books. The first is the introduction to the wizarding universe from the view of an 11 year old. It is magical and fantastic, and danger is not really comprehensible. The stakes are higher in the second and third, where the danger becomes life threatening. In Goblet of Fire we are once again introduced to something magical and exciting. The fifth and sixth books both have both discouraging and encouraging plot lines running in the books narrative. And as expected the seventh and last book Voldemort sends the wizarding world into a downwards spiral, that Harry must save, which makes the “mood” to be less “happy”, as expected, compared to the other books by looking at the graph.

Sentiments by Book

sentiment

Chapters

The sentiment analysis of the book chapters are also based of the dialog. Here the overall development in sentiment is very linear. However, key events such as Dumbledore’s death in the sixth book and the resurrection of Voldemort in the fourth book clearly identifiable on the graph, oddly enough this does not apply to the death of Sirius in Order of the Phoenix. The second chapter in Goblet of Fire is entirely inner dialogue and description aka. no dialog, and thus has a sentiment of zero, which can be seen in the graph, where there is a gap from the start of the book.

Sentiments by Chapter

sentiment

Movie-Scenes

The sentiment is calculated for a full scene, stage directions and all, as it was very hard to seperate the dialogs. The average sentiment of the movies are 0.5 below that of the books, which would be interesting to further investigate. The drop in Prisoner of Azkaban is due to a faulty scene. In Order of the Phoenix the sentiment is much lower towards the end of the movie possibly indicating that the movie has better conveyed the sad feeling of Harry because of Sirius’ death. The death of Cedric and Dumbledore are however not very clearly identifiable.

Sentiments by Movie-Scenes

sentiment