Building a website about conspiracy theories using XML, XQuery, and Python was a challenging but rewarding project. The use of XML and XQuery provided a structured and flexible way to organize the website's content, making it easier to manage and update. Python, on the other hand, helped automate tasks and streamline the website's functionality. However, one of the most significant challenges was ensuring that the website's content was balanced and presented in an objective manner, especially given the sensitive nature of conspiracy theories. Overall, the project required a combination of technical skills, critical thinking, and sensitivity to ensure that the final product was informative and thought-provoking without promoting any specific conspiracy theory.
In addition to XML, XQuery, and Python, incorporating Spacy and Cytoscape into the project added another layer of complexity and functionality. Spacy, a natural language processing library, allowed for more advanced analysis of the website's content and provided the ability to extract and analyze data from unstructured text. Cytoscape, a network analysis and visualization tool, provided a way to visualize relationships between various conspiracy theories and the people, organizations, or events they involve. The integration of these tools into the website not only enhanced its functionality but also improved its overall user experience by presenting the content in a more interactive and engaging way. However, it also required a greater understanding of the technical aspects of these tools, which added an extra level of complexity to the project. Overall, the use of Spacy and Cytoscape showcased the possibilities of incorporating cutting-edge technologies into website development and demonstrated the importance of staying up-to-date with emerging technologies.
Building a website using HTML and CSS comes with its own set of challenges. One of the biggest challenges is ensuring that the website is responsive and looks good on different devices with different screen sizes. This involves writing code that can adapt to different screen sizes and resolutions, which can be time-consuming and requires attention to detail. This also allowed us to display all our work in one place for anyone to see and experience. The html allowed us to give the website structure, while css allowed us to express ourselves with the styling of the website.
One of the main challenges behind the scenes was the text files. We had to grab any viable documents from textfiles.com, and wrap any nodes that referenced any entity. To solve this, we pushed all the files through a pipeline of techniques and scripts.
Once we scrapped the usable conspiracies from textfiles.com, we had to replace all characters that would be incompatible with the .xml or .html file format. When all special characters were replaced, it was time to move onto wrapping paragraphs or any loose strings of text into <p> elements, in order for the texts to be legible on the website. Once we had each file has been appropriately wrapped and checked for any errors, the next step was pushing each file into a Python script that uses Natural Language Processing, RegeX, and SpaCey to wrap any found entities. Unfortunately, the script's work was not perfect. There were issues with overlapping tags, and many names were not wrapped at all. The script would have go through each file again, and check for any invalid wrapping. If all the files were valid, and lacked any errors, it was time to use xQuery and XSLT to turn the files into pages that can be viewed from a browser.
After all of that, text files are viewable on the website and would be wrapped accordingly to what entities were referenced. But there was still more to be done in regards to the vizualization of the text files and their entities. Using xQuery, we were able to generate a .tsv file consists of a table, counting how many times an entity was referenced and what file said entity is referenced from. Then, we throw the .tsv file into CytoScape, and generate a vast network of all of the interconnecting nodes.
Very special thanks to TextFiles.com for their awesome directory of text files! They were the foundation of this project, and this would of fell apart without them.