Pull to refresh

Detecting the Undetectable: An Experiment on AI Text Rewriting

Level of difficultyMedium

The Significance of AI Detectors

In an era of unprecedented technological advancement, Artificial Intelligence (AI) has permeated almost every facet of our lives, including content creation and academic research. With the advent of sophisticated AI writing algorithms, the risks of plagiarism, data duplication, and intellectual property theft have significantly increased. It has resulted in the developing and popularizing of AI-powered plagiarism detectors—tools designed to identify copied or suspiciously similar content. Platforms like Copy Leaks, ZeroGpt, and Sherlock AI Detector use complex algorithms to scan vast swathes of online content, providing an invaluable service in safeguarding the integrity of academic, journalistic, and creative works.

Experiment Objective and Methodology

The integrity and originality of scholarly work form the cornerstone of academic integrity, a foundational tenet underpinning higher education and scientific research. With the advent of increasingly sophisticated artificial intelligence (AI) algorithms capable of generating human-like textual content, the sanctity of this cornerstone is imperiled. Consequently, the capacity to discern between original human-authored texts and AI-rewritten ones has metamorphosed into a pressing yet relatively unexplored technological and ethical imperative. This experiment aims to rigorously scrutinize the performance of extant AI text detection algorithms in identifying rewritten scholarly content. By doing so, we aim to assess the current vulnerabilities in the system and provide empirically informed recommendations to bolster the mechanisms safeguarding academic probity. The implications of this research extend beyond academia, resonating across sectors where the authenticity of textual content is pivotal. 

As the technology behind AI writing algorithms continues to evolve, so does the need to ensure that these AI detectors can keep up with the increasingly nuanced and complex texts generated by AI systems. Our experiment aims to put these detectors to the test. Can they distinguish between human-written content and AI-rewritten text? And if so, how effective are they? 

We selected an excerpt from the John Grisham novel "The Camino Island," available on the official John Grisham website to address these questions. We initially ran this human-written text through three different AI detectors—Copy Leaks, ZeroGpt, and Sherlock—to establish a baseline for detection. Subsequently, we used an AI system to rewrite the excerpt and ran this modified version through the identical detectors.

We aim to assess how these detectors fare when faced with AI-rewritten content, which could be sophisticated enough to evade their algorithms. 

This article will present the results of this experiment, providing a comparative analysis that will explore the limitations and capabilities of current AI detection systems when pitted against evolving AI writing technologies. 

A copy of Genuine Human written text: The Camino Island, by John Grisham:

We have then asked Chat GPT to re-write this piece of text. Here is what we received:

We worked with these two texts and ran both pieces through three AI detectors – ZeroGpt, Copy Leaks and Sherlock AI Detector.  Our first step was to run text number two - a re-written version of the original human-generated text. 

Upon running the AI-rewritten text through the selected detectors, we observed noteworthy discrepancies in the detection rates among Copy Leaks, ZeroGpt, and Sherlock AI. Specifically, Copy Leaks assessed a 64.3% likelihood that the content was human-authored, while ZeroGpt astonishingly returned a 0% detection rate for AI-generated text, categorizing it as human-written. Conversely, Sherlock AI exhibited a more cautious approach, identifying 1.17%% of the text as potentially AI-generated.

Interpretation of Results

Variability in Detection Algorithms

The variances in the output metrics suggest inherent disparities in the algorithmic frameworks these detectors utilize for content scrutiny. It is conceivable that each service monitor's linguistic features or stylometric indicators are divergent in scope and sensitivity. 

Algorithmic Shortcomings and False Negatives

The result from ZeroGpt, indicating a 0% detection of AI text, is particularly concerning as it represents a false negative. This outcome underscores the latent shortcomings in the algorithm's capability to detect nuanced, AI-manipulated text, thus posing a potential vulnerability in academic and professional settings where content originality is paramount. 

Partial Efficacy

Sherlock AI's detection rate of 1.17% suggests low sensitivity to an original human-written text further rewritten by Chat GPT in identifying AI-rendered modifications. Although inaccurate, this moderate detection rate could indicate potential compromise, warranting further investigation. We must expect a higher percentage of detection of AI if Chat GPT has altered the text.

 Outcomes and Implications

Inconsistency in Detection

The marked inconsistency among the results implies that more than relying on a single AI detector may be required to validate text originality. 

Need for Algorithmic Robustness

The significant detection rate gaps call for algorithmic robustness enhancements to adapt to increasingly sophisticated AI-generated text.

Critical Review of Current Systems

Given the emerging capabilities of AI in generating human-like text, there is an urgent necessity for a critical review and possible overhaul of existing detection algorithms. 

In the second stage of our experiment, we ran the original text through the three detectors again and recorded the outcomes.

Copy Leaks: Augmented Accuracy

 The Copy Leaks algorithm was more productive in identifying the original human-written text, as evidenced by the 95.5% confidence score, compared to the 64.3% rating for the AI-rewritten text. It could suggest a more attuned algorithm for detecting authentic human authorship but also highlights its limitations in differentiating sophisticated AI-rewritten content. 

ZeroGpt: A False Positive Dilemma

 The most perplexing result comes from ZeroGpt, which reported a 49.4% probability of AI-generated content in a genuinely human-authored text. It represents a significant false positive and underpins questions about the algorithm's reliability and underlying methodology. Contrast this with its 0% AI detection rate for the AI-rewritten text, and it's clear that ZeroGpt's efficacy is highly questionable. 

Sherlock: Low sensitivity detection 

Sherlock's high % confidence level of 99.25% (0.75% AI detected) for human authorship in the original text, and similar 1.17% AI-generated content detection in the rewritten text, suggests low sensitivity to the text that was rewritten by AI and requires room for improvement. 

Outcomes and Implications 

The disparate results point to an imperative need for standardizing the methodologies employed by AI detectors to ensure consistent and reliable outcomes.

ZeroGpt's results represent false positives and negatives—a critical vulnerability in any system that ensures content integrity. It raises alarm bells for the scholarly and publishing communities.

While Sherlock and Copy Leaks appear more robust in their detection capabilities, their varying efficacy across original and rewritten text reflects a trade-off between robustness and sensitivity.

Experiment Summary

Our research sought to evaluate the efficacy of three leading AI detection platforms—Copy Leaks, ZeroGpt, and Sherlock—in distinguishing between human-written and AI-rewritten text. We used an excerpt from John Grisham's novel "The Camino Island" as our test subject, running the original text and then an AI-rewritten version through these detectors. The results revealed significant disparities in the detectors' abilities to accurately identify the nature of the text, ranging from false positives to false negatives. 

None of the detectors demonstrated consistent efficacy across the original and rewritten text. It raises questions about the reliability of existing systems. 

The differing results across platforms suggest inherent weaknesses in current algorithms, particularly their ability to detect AI-rewritten text, which is becoming increasingly sophisticated. 

While some platforms appeared more robust, none perfectly balanced robustness and sensitivity, indicating room for improvement. 

Utility of Experiment Data 

The data generated from this experiment is invaluable for: 

  • Algorithmic Auditing is an empirical basis for critiquing and auditing current detection methods.

  • Research & Development - the results can guide future work in enhancing the capabilities of AI detectors, possibly through ensemble methods or adaptive learning algorithms.

  • Policy Formulation - the insights could be crucial for academic and publishing communities in shaping policies around plagiarism and content integrity.

Future Directions and Recommendations

  • Development of Hybrid Detection Systems: Given the inconsistency among single algorithms, a hybrid approach combining multiple algorithms could offer a more reliable detection mechanism.

  • Incorporation of Machine Learning: Adaptive learning algorithms could continually update and fine-tune detection methods.

  • AI Writing Abilities: On the flip side, the results also signal the need for further refinement in AI writing technologies, intending to create algorithms that generate high-quality and distinguishable content from human-written text to maintain ethical standards.

Concluding Thoughts

 As AI's capability to generate human-like text evolves, so must our mechanisms for detecting such content. Our study highlights the vulnerabilities in current systems and catalyzes the necessary technological advancements in AI text generation and detection.

Our study elucidates the mounting complexities we face in ensuring content authenticity in an era where the boundary between human and AI-generated content is rapidly blurring. As AI technologies become more pervasive, their potential to disrupt traditional notions of authorship, intellectual property, and content integrity increases exponentially.

While our experiment focused on a specific facet of AI's intersection with human-generated content, its implications are far-reaching, touching upon ethical, societal, and technological domains. For instance, the ease with which AI can mimic human writing styles brings into question the ethical considerations around machine-authored content, its applicability, and its implications for academic and journalistic integrity.

Moreover, our findings serve as a starting point for a broader conversation that must include multiple stakeholders—technologists, ethicists, policymakers, and educators. The complexity of the challenge suggests that no single approach will suffice. Collaborative, multi-disciplinary efforts are needed to understand AI's evolving capabilities and develop robust, reliable mechanisms for their ethical and effective utilization.

Finally, while our study highlights vulnerabilities and inconsistencies, it underscores the incredible progress made in AI text generation and detection technologies. These tools hold enormous promise for various applications, from automating routine tasks to potentially revolutionizing how we interact with information. However, their advancement must be matched by equal strides in detection capabilities, creating a balanced ecosystem where innovation is coupled with integrity.

In summary, the urgency for action is apparent. As the boundaries of what AI can continue to expand, our strategies for managing and mitigating its impact must evolve in tandem. Our experiment serves as both a cautionary tale and an optimistic roadmap toward a future where the coexistence of human and AI-generated content is both productive and ethical. 

Final touch and Food for thought 

Finally, we decided to check the AI-rewritten version of the original John Grisham piece with the Grammarly Plagiarism checker.  

Using Grammarly's Plagiarism checker in our experimental framework presents a surprising and concerning result: the AI-rewritten text was flagged as 100% original. This outcome adds a new layer of complexity to detecting AI-generated content, specifically because Grammarly is widely adopted for plagiarism checks in academic and professional settings.

The fact that Grammarly marked the AI-rewritten text as completely original could have severe implications for content integrity. In an academic setting, such a result could give a false sense of security, erroneously validating plagiarized or unauthentic content.

Grammarly's inability to detect the AI-rewritten text as derivative or non-original suggests a limitation in its algorithm, perhaps because it is not calibrated to detect nuances between human-written and sophisticated AI-rewritten content.

The result also speaks to the evolving sophistication of AI-generated text. AI algorithms are increasingly capable of producing text that not only mimics human writing but does so in a way that evades traditional plagiarism detection methods. 

Need for Advanced Detection Systems

The result from Grammarly, a trusted tool for millions, accentuates the need for more advanced plagiarism detection systems that can discern AI-generated modifications and similarities to the original text.

Critical Review of Current Tools

Institutions relying on Grammarly or similar tools for plagiarism detection must reassess their efficacy, especially in an increasingly sophisticated text-generating AI era.

Interdisciplinary Approach

Addressing this challenge will require a multi-pronged, multidisciplinary approach involving machine learning, natural language processing, and ethical computing.

Transparency and Auditing

There is a heightened need for transparency in how these tools function algorithmically. Third-party audits could provide additional scrutiny and confidence in these platforms.

Ethical Considerations

The onus is also on the developers of text-generating AI to build safeguards that allow their creations to be easily identifiable, thereby ensuring responsible and ethical usage. 

The Grammarly result is a stark reminder that AI text detection is still in flux, grappling with challenges that have far-reaching implications for academia, journalism, and beyond. With AI-generated content becoming more complex and indistinguishable, the tools we rely on for ensuring content integrity must evolve in parallel. Failure to do so could compromise the foundations of intellectual and professional trust that our educational and organizational systems are built upon. 

In the spirit of academic rigour and transparency, we are pleased to announce that all texts, datasets, and materials related to this experiment have been publicly available. This open-source approach allows the broader scientific community to scrutinize, replicate, and build upon our findings. By sharing these resources, we aim to foster a collaborative atmosphere that encourages further investigations into the evolving landscape of AI-generated text and its detection. We welcome scholars, technologists, and interested parties to re-run our experiment, contributing to a more comprehensive and nuanced understanding of this critically important field.

 

Tags:
Hubs:
You can’t comment this publication because its author is not yet a full member of the community. You will be able to contact the author only after he or she has been invited by someone in the community. Until then, author’s username will be hidden by an alias.