The contemporary workplace has embraced generative AI as a key driver for next-generation productivity, with examples of corporate meetings leveraging tools that automate note-taking, synthesize discussions, and even generate actionable insights. This evolution introduces a critical decision for organizations: should they adopt open-source or closed-source AI-powered tools? This choice carries significant implications for data privacy, system reliability, and regulatory compliance, particularly when managing sensitive corporate information. This case study will explore both options, emphasizing why closed-source solutions offer more transparency in the handling of private organizational data.
Open-source generative AI refers to AI models and tools whose underlying code is publicly available for other developers to use, customize, and build upon existing models. This fosters a collaborative environment, enabling rapid innovation and adaptation for specific use cases. For example, TensorFlow, developed by Google, is a widely used open-source platform that allows developers to create and deploy custom AI solutions, ranging from natural language processing to computer vision.
In contrast, closed-source generative AI refers to proprietary systems where the source code is not accessible to the public and remains under the control of the organization that develops it. This lack of transparency creates barriers to innovation and limits the ability of users to fully understand or modify the underlying model. A prime example is OpenAI's ChatGPT (GPT-4), which is locked behind a subscription-based API, without any access to the model itself, restricting user autonomy and scrutiny.
While open-source models excel in flexibility, transparency, and community-driven innovation, closed-source impose restrictions that limit innovation and create dependency on the organization controlling them, making them less appealing to those who value openness and collaboration.
Open-source AI offers significant advantages in data privacy primarily because it (1) provides transparency, (2) control, and (3) accountability in safeguarding user data[1].
Open-source AI offers transparency primarily because it allows anyone to inspect the code and understand how their data is being handled. This transparency means users can verify that their data is treated responsibly, ensuring no hidden data collection practices. Independent audits can also be conducted to identify any privacy concerns that might be overlooked in closed-source systems, particularly in countries where data privacy regulations are still developing.
Moreover, with open-source systems, users have more control over their data. Since the code is accessible, users or organizations can modify the system to align with their specific privacy requirements, such as ensuring data isn't stored longer than necessary or preventing certain data from being collected in the first place.
Lastly, in open-source models, there is no corporate incentive to exploit user data for profit, as the community often dictates the direction of development. Open-source communities can quickly identify and address security vulnerabilities or privacy risks creating a fast response to vulnerabilities.
In contrast, the key feature of Closed-source AI is its proprietary code which is accessible only to the organization that develops it. The risks involved with using a closed-source model include: (1) inability to independently inspect the data system, (2) misuse of data, and (3) data leakage to public knowledge[2].
The centralized control from the developing company in closed-source systems makes the data handling practice go behind the scenes, forcing clients to rely on the promise developers claimed about their data practices. An example of this is the controversy surrounding Facebook and Cambridge Analytica. While Facebook is not a fully closed-source platform, its internal algorithms and data-handling practices are proprietary and not transparent to users. In this case, users and clients were relying on Facebook’s claims about their data practices unaware of how their data was being shared by third parties, which ultimately led to the misuse of personal data, sparking global outrage and regulatory scrutiny [3].
This data misuse is mainly because, in closed-source models, the company behind the system may have financial or marketing incentives to gather and monetize user data, creating potential conflicts of interest regarding data privacy.
Using popular closed-source AI models like ChatGPT, Gemini, and Claude can potentially lead to data leakage into public knowledge because user input is used to train the model. Without the ability for independent verification, sensitive user inputs or proprietary information processed by these models could inadvertently be exposed through breaches, misuse, or vulnerabilities in the system, revealing proprietary knowledge to become model knowledge.
By open-sourcing Llama 3.1, Meta is taking a bold and praiseworthy step toward making AI technology accessible to everyone. As mentioned in Meta’s blog, Llama 3.1 not only offers cutting-edge functionality but also integrates safety features, such as Code Shield, to guarantee secure code generation, striking a balance between transparency and security.
Anton Troynikov, a co-founder and head of technology of AI startup Chroma said that Llama could allow the company to give its users more control over how its data is used. “Now you don’t have to send any data outside of your system, you can run it 100 per cent internally on your machines,” said Troynikov, “your data no longer has to go anywhere to get these fantastic capabilities.”[4]
This effort from Meta is seen as a positive step towards responsible AI development, addressing concerns about data privacy and potential misuse[5]. Ultimately building the company’s reputation by advocating for innovation and inclusivity.
At Quorum AI we develop our AI based on an open-source AI model to control data compliance and bring operational transparency in how we handle our customer’s data. Additionally, our model is designed to uphold rigorous security measures, data encryption, and auditing capabilities, which closed-source solutions may not offer.
In conclusion, open-source models foster greater data privacy by providing users with the transparency, control, and security needed to protect their information, while closed-source systems often lack the level of oversight necessary to ensure privacy.
[1] B.-J. Koops, J.-H. Hoepman, and R. Leenes, “Open-source intelligence and privacy by design,” Computer Law & Security Review, vol. 29, no. 6, pp. 676–688, Dec. 2013, doi: https://doi.org/10.1016/j.clsr.2013.09.005.
[2] A. Dang, “The Open Advantage: Winning the Adversarial Battle with Open-Source Models,” Social Science Research Network, Jan. 2023, doi: https://doi.org/10.2139/ssrn.4651571.
[3] A. Newcomb, “A timeline of Facebook’s privacy issues — and its responses,” NBC News, Mar. 24, 2018. https://www.nbcnews.com/tech/social-media/timeline-facebook-s-privacy-issues-its-responses-n859651
[4] D. W. T, “This week, Meta announced the release of Llama 3.1, its latest AI model, and made a strong case for the benefits of open-source AI.,” Linkedin.com, Jul. 28, 2024. https://www.linkedin.com/pulse/open-source-good-all-metas-bold-move-llama-31-diana-wolf-torres-utjjc/ (accessed Jan. 15, 2025).
[5] S. Ghaffary, “Why Meta is giving away its extremely powerful AI model,” Vox, Jul. 28, 2023. https://www.vox.com/technology/2023/7/28/23809028/ai-artificial-intelligence-open-closed-meta-mark-zuckerberg-sam-altman-open-ai