Hosted large-language-model inference endpoints on SecNumCloud-qualified GPUs. Positioned as the French sovereign alternative to hosted OpenAI / Anthropic APIs for regulated workloads — health data (HDS-qualified), public-sector, defence-adjacent
Jurisdictional exposure
Sub-services (2)
Inference Endpoints
Hosted endpoints for open-source LLM variants
Embedding Models
Vector-embedding endpoints for RAG / similarity search
Compliance & Certifications
This service is attested for the following frameworks. Always verify with the provider before relying on a specific compliance posture.
Where this runs
Sovereign regions (2)
- Cloud Temple Paris · ParisSecNumCloud
- Cloud Temple Marseille · MarseilleSecNumCloud
Tags
Equivalent services on other platforms
Enterprise ML and AI platform covering PAI-Studio visual workflow builder, PAI-DSW Jupyter notebooks, PAI-EAS elastic inference serving, PAI-Blade inference optimisation, and integration with Alibaba's Qwen foundation models
Alibaba's flagship open-source foundation model family covering Qwen (text), Qwen-VL (vision-language), Qwen-Audio, and Qwen-Coder — accessible via the DashScope API with chat, completion, embeddings, and function-calling endpoints
Dedicated AI infrastructure stack combining GPU-on-Demand compute, Object Storage, and managed model hosting for end-to-end AI workloads on Italian-sovereign infrastructure
Next-generation SageMaker (rebranded SageMaker AI) unifying data, analytics, and AI in one workspace — Studio notebooks, HyperPod for foundation-model training at scale, Lakehouse with QuickSight + S3 Tables integration, AutoPilot AutoML, managed training jobs, hosted inference endpoints, and Feature Store, with re:Invent 2024 introducing the unified SageMaker AI workspace and 2025 Summit additions extending it with lakehouse auto-onboarding
Build generative AI applications with foundation models from Anthropic (Claude Opus 4.7 from April 2026), Cohere, Meta, Mistral, Stability AI, TwelveLabs (video understanding), and Amazon's own Nova family — accessed via a single API with fine-tuning, knowledge bases, agents, and a model marketplace for discovery and easy onboarding
AWS-built foundation model family covering text (Micro, Lite, Pro, Premier), image generation (Canvas), and video generation (Reel) — accessed through the Bedrock runtime with tight pricing and low-latency streaming, launched at re:Invent 2024
Generative AI assistant family spanning software development (Q Developer, formerly CodeWhisperer), enterprise knowledge retrieval (Q Business), low-code app generation (Q Apps), and contact-centre augmentation (Q in Connect) with grounded answers against your own data
Production runtime for AI agents — managed memory, identity, gateway, observability, and tool integration so teams can ship agentic workflows on top of any framework (Strands Agents, LangGraph, CrewAI, vendor-direct) without rebuilding the operational substrate
Enterprise access to OpenAI models including GPT-4, GPT-3.5, and DALL-E with Azure security, private networking, regional deployments, and pay-as-you-go or provisioned throughput
End-to-end platform for building and deploying ML models with automated ML, designer (drag-and-drop), managed compute clusters, MLflow tracking, and responsible AI dashboards
Serverless GPU-backed AI inference at the edge running a catalogue of open-source text, image, speech, and embedding models (Llama, Mistral, Stable Diffusion, Whisper, BGE) with pay-per-neurone pricing and direct binding from Workers code
Managed retrieval-augmented-generation service — index your content (R2 buckets, websites, Workers KV) and query it with natural language from a Workers binding, REST API, or MCP server. Originally launched as AutoRAG and renamed AI Search in 2026.
End-to-end AI platform (formerly MLflow + Mosaic ML) for training, fine-tuning, deploying, and monitoring foundation models and custom ML models on the Lakehouse
Hosted MLflow with managed tracking server, model registry, and deployment integration — open-source ML lifecycle tooling tightly coupled to Databricks Unity Catalog and Mosaic AI
Dedicated AI/ML infrastructure combining GPU compute, Object Storage for datasets, and managed model hosting on Swiss-resident infrastructure — positioned for regulated EU customers needing AI workloads outside US jurisdiction
AI inference runtime that deploys models to Gcore's edge POPs and routes requests to the nearest GPU-backed endpoint, with support for open-source LLMs and custom model containers
Unified platform to build, deploy, and scale ML models with AutoML, custom training on TPUs and GPUs, model registry, pipelines, feature store, and generative AI studio
Direct API access to Google's most capable multimodal AI models with text, image, audio, and video understanding, long context windows, and function calling support
End-to-end AI development platform with AutoML, data labelling, distributed training on Ascend and GPU clusters, and one-click deployment to cloud or edge
Conversational AI for building chatbots and virtual agents with visual dialogue builder, intent and entity detection, voice integration via phone, and multi-channel deployment
Enterprise AI studio for training, validating, tuning, and deploying foundation models and traditional ML models, with IBM's Granite model family, Hugging Face integration, prompt lab, synthetic data generation, and governance via watsonx.governance
Sovereign Swiss AI suite: managed inference endpoints for open-source LLMs (Llama, Mistral), AI Studio chat interface (Kchat), and document-analysis APIs — positioned as the Swiss-resident alternative to hosted OpenAI / Anthropic APIs
Managed hosting and inference endpoints for open-source large language models (Llama, Mistral, DeepSeek) running on EU-resident GPU infrastructure with OpenAI-compatible API, positioned as the GDPR-resident alternative to OpenAI / Anthropic hosted APIs
Kakao's Korean-first foundation-model family (Kanana Flash / Essence / Nano) for chat, code, and embeddings — multilingual but tuned for Korean conversational performance
Naver's HyperCLOVA X foundation-model platform for Korean-language LLM workloads — chat completion, embeddings, function calling, RAG over Korean text with strong native-language performance
Managed serverless inference endpoints for open-source large language models hosted on Nscale's GPU infrastructure, with OpenAI-compatible API and per-million-tokens pricing
Managed MLOps platform (formerly Open Data Hub) for training, serving, and monitoring ML models on OpenShift with JupyterHub, KServe, Kubeflow, and PyTorch operators
Managed inference service hosting Cohere Command and Embed plus Meta Llama large language models — pay-per-token chat / completion / embedding APIs, plus fine-tuning on customer datasets via dedicated AI clusters
End-to-end platform for building, deploying, and governing production AI workloads on OCI — unifies Gen AI models, agent orchestration, retrieval-augmented generation, and policy-based governance controls in a single managed service so enterprises don't have to assemble them from primitives.
Fully managed machine learning platform with JupyterLab notebooks, conda-environment library, job orchestration, model deployments as HTTPS endpoints, feature store, and model catalog — integrated with Autonomous Database and Object Storage for end-to-end ML workflows
Managed AI platform hosting Mistral AI models (including Le Chat Enterprise) on Outscale's SecNumCloud-eligible French infrastructure — positioned as the strictest-sovereignty alternative to hosted OpenAI / Anthropic / Bedrock APIs for European public-sector and regulated workloads
Managed serverless inference endpoints for open-source large language models (Llama, Mistral, DeepSeek), with OpenAI-compatible API and per-million-tokens pricing — positioned as the EU-resident alternative to hosted OpenAI / Anthropic APIs
Hosted Model Context Protocol server letting AI agents (Claude / GPT / custom) discover and interact with OVHcloud resources (PostgreSQL, Managed Kubernetes, Object Storage) via the MCP spec — positioned as the EU-resident bridge between AI agents and infrastructure operations
AI layer integrated across Salesforce products: predictive lead scoring, opportunity scoring, generative AI for emails and summaries
Salesforce platform for building, deploying, and governing autonomous AI agents grounded in CRM data, with Atlas Reasoning Engine, Agent Studio, and Data Cloud retrieval
Managed inference for open-source LLMs (Llama, Mistral, DeepSeek) hosted in EU datacentres
Managed serverless inference endpoints for hosted open-source LLMs (Llama, Mistral, DeepSeek) with OpenAI-compatible API and per-million-tokens pricing — distinct from Scaleway's full AI Platform (training + custom-model hosting)
Developer framework for building data applications in Python, Java, and Scala that run inside Snowflake
Fully managed AI and ML service offering hosted LLMs, vector search, and ML functions inside Snowflake SQL
Managed serverless inference endpoints for hosted open-source LLMs (Llama, Mistral) with OpenAI-compatible API and per-million-tokens pricing — positioned as the German-sovereign alternative to hosted OpenAI / Anthropic APIs
Tencent's in-house family of large language models (Hunyuan-Pro, Standard, Lite, plus multimodal Hunyuan-Vision) accessible via the Hunyuan API, with enterprise-grade context windows up to 256K, function calling, embeddings, and tuning