Jetson NX camera development is not one SDK. It is a stack. The relevant layer depends on the camera type, whether the Jetson ISP is involved, whether the output is a video stream or an AI perception pipeline, and whether the application lives in robotics, industrial inspection, video analytics, or a custom embedded product.
This note is a companion to Codex on NVIDIA Ubuntu: AI-Native Development Architecture. The larger thesis is that Codex can operate inside a real NVIDIA Ubuntu development environment. Jetson NX camera work is a good concrete target because it forces the agent to reason across hardware, drivers, camera APIs, GStreamer, AI inference, and application code.
Rule of thumb: start by identifying the camera path. USB/UVC, CSI Bayer through ISP, raw CSI driver bring-up, DeepStream analytics, and ROS perception all point to different NVIDIA APIs.
The base layer: JetPack and Jetson Linux
JetPack SDK is the foundation. It packages Jetson Linux, CUDA, TensorRT, cuDNN, VPI, multimedia libraries, samples, and developer tooling for Jetson. For Jetson NX work, JetPack is not optional background material. It defines the Linux release, kernel, camera stack, CUDA generation, TensorRT generation, and supported higher-level SDKs.
Jetson Linux, sometimes still referred to through the older L4T vocabulary, is where the camera driver framework, device tree integration, V4L2 media-controller support, Argus camera path, and multimedia components live. If you are bringing up a CSI sensor, this is the layer where the hard work usually begins.
The camera capture layer
NVIDIA's Jetson camera documentation describes the practical camera architecture around three application paths:
If the application needs ISP features for a CSI camera, Argus and nvarguscamerasrc are central. If the camera is a USB webcam, V4L2 and GStreamer may be enough. If the team is developing a custom MIPI CSI sensor driver, start with Jetson Linux camera driver bring-up and V4L2 validation before pretending the app layer is the problem.
The pipeline layer: GStreamer and Multimedia API
GStreamer is the practical glue for camera capture, display, encode, stream, and handoff into AI pipelines. On Jetson, NVIDIA's accelerated GStreamer elements matter because they keep data on accelerated paths instead of bouncing every frame through slow CPU copies.
Jetson Multimedia API matters when the application needs lower-level access to hardware encode/decode, buffers, or video processing primitives. A lot of teams can stay at GStreamer level first, but the Multimedia API becomes relevant when the product needs tighter control over performance, latency, buffers, or custom integration.
The vision and inference layer
Once frames are captured reliably, the next question is what the application does with them.
| SDK | Use it when |
|---|---|
| VPI | You need accelerated computer vision or image processing on Jetson hardware accelerators. |
| CUDA | You need custom GPU kernels or GPU-side image/model pre/post-processing. |
| TensorRT | You need optimized neural network inference from camera frames. |
| cuDNN | You are using deep learning frameworks or lower-level neural network acceleration. |
| DeepStream | You are building video analytics: detection, tracking, multi-stream processing, metadata, and streaming output. |
| Isaac ROS | You are building robotics perception, ROS 2 graphs, visual SLAM, stereo, depth, or navigation-related camera workflows. |
Common project shapes
USB camera prototype
Use V4L2 and GStreamer first. Prove capture with v4l2src, then decide whether the app needs DeepStream, TensorRT, or a custom Python/OpenCV pipeline.
CSI camera with Jetson ISP
Use the Jetson camera stack, Argus, and nvarguscamerasrc. If the sensor is not already supported, expect sensor driver, device tree, and image quality tuning work.
Custom raw sensor bring-up
Start at Jetson Linux camera driver development. Validate the V4L2 media-controller path before building higher-level app code. This is not primarily a DeepStream problem yet.
Industrial video analytics
Use GStreamer and DeepStream, then TensorRT for optimized inference. DeepStream is the most relevant higher-level SDK when the camera feed becomes an AI analytics pipeline.
Robotics camera system
Use Isaac ROS where ROS 2, robotics perception, sensor graphs, stereo, visual SLAM, or navigation integration matter. The camera capture layer still has to be correct underneath.
Why this matters for Codex
This is exactly the kind of stack where a repo-aware agent can help. A human can say, "This is an Orin NX camera app using a CSI sensor and DeepStream," and Codex can turn that into a concrete checklist: host facts, JetPack version, camera path, V4L2 validation, GStreamer smoke test, inference path, sample app, and documentation.
The important thing is to give Codex the right architecture map. Without it, every camera problem looks like an application bug. With it, the agent can ask the better question: are we failing at the sensor, driver, capture, pipeline, inference, or product layer?