TorchServe

Peace. Jesus. I'd recommend you The Bible Studio.

Today, I'd love to have some fun of TorchServe.

1. TorchServe Getting Started

Please just follow TorchServe Getting Started, with trivial modifications. In my demonstration, I stick to working under directory /opt/servers/torchserve.

1.1 Clone TorchServe

1
2
3
4
5
6
7
8
➜  torchserve git clone https://github.com/pytorch/serve.git        
Cloning into 'serve'...
remote: Enumerating objects: 60508, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 60508 (delta 6), reused 3 (delta 0), pack-reused 60483 (from 1)
Receiving objects: 100% (60508/60508), 99.06 MiB | 23.01 MiB/s, done.
Resolving deltas: 100% (37656/37656), done.

1.2 Store a Model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
➜  torchserve mkdir model_store
➜ torchserve wget https://download.pytorch.org/models/densenet161-8d451a50.pth
--2024-10-07 00:35:54-- https://download.pytorch.org/models/densenet161-8d451a50.pth
Resolving download.pytorch.org (download.pytorch.org)... 2600:9000:26ce:7a00:d:607e:4540:93a1, 2600:9000:26ce:6e00:d:607e:4540:93a1, 2600:9000:26ce:3400:d:607e:4540:93a1, ...
Connecting to download.pytorch.org (download.pytorch.org)|2600:9000:26ce:7a00:d:607e:4540:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 115730790 (110M) [application/x-www-form-urlencoded]
Saving to: ‘densenet161-8d451a50.pth’

densenet161-8d451a50.pth 100%[==================================================================================================================================================>] 110.37M 109MB/s in 1.0s

2024-10-07 00:35:55 (109 MB/s) - ‘densenet161-8d451a50.pth’ saved [115730790/115730790]
➜ torchserve torch-model-archiver --model-name densenet161 --version 1.0 --model-file ./serve/examples/image_classifier/densenet_161/model.py --serialized-file densenet161-8d451a50.pth --export-path model_store --extra-files ./serve/examples/image_classifier/index_to_name.json --handler image_classifier
➜ torchserve ll model_store
total 106M
4.0K drwxrwxr-x 2 lvision lvision 4.0K Oct 7 00:36 ./
4.0K drwxrwxr-x 5 lvision lvision 4.0K Oct 7 00:35 ../
106M -rw-rw-r-- 1 lvision lvision 106M Oct 7 00:36 densenet161.mar

1.3 Start TorchServe

1.3.1 config.properties

1
2
3
4
5
6
7
8
9
➜  torchserve cat config.properties 
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
enable_token_auth=true
ts.key.file=/opt/servers/torchserve/key_file.json
log_location=/opt/servers/torchserve/logs
metrics_location=/opt/servers/torchserve/logs
access_log_location=/opt/servers/torchserve/logs

1.3.2 Start TorchServe to serve the model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
➜  torchserve torchserve --start --ncs --model-store model_store --models densenet161.mar
➜ torchserve WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-10-07T00:38:23,323 [DEBUG] main org.pytorch.serve.util.ConfigManager - xpu-smi not available or failed: Cannot run program "xpu-smi": error=2, No such file or directory
2024-10-07T00:38:23,326 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2024-10-07T00:38:23,361 [INFO ] main org.pytorch.serve.util.TokenAuthorization -
######
TorchServe now enforces token authorization by default.
This requires the correct token to be provided when calling an API.
Key file located at /opt/servers/torchserve/key_file.json
Check token authorization documenation for information: https://github.com/pytorch/serve/blob/master/docs/token_authorization_api.md
######

2024-10-07T00:38:23,361 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-10-07T00:38:23,399 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/lvision/.local/lib/python3.12/site-packages/ts/configs/metrics.yaml
2024-10-07T00:38:23,484 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.12.0
TS Home: /home/lvision/.local/lib/python3.12/site-packages
Current directory: /opt/servers/torchserve
Temp directory: /tmp
Metrics config path: /home/lvision/.local/lib/python3.12/site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 48
Max heap size: 30208 M
Python executable: /usr/bin/python3
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /opt/servers/torchserve/model_store
Initial Models: densenet161.mar
Log dir: /opt/servers/torchserve/logs
Metrics dir: /opt/servers/torchserve/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store: /opt/servers/torchserve/model_store
CPP log config: N/A
Model config: N/A
System metrics command: default
Model API enabled: false
2024-10-07T00:38:23,491 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2024-10-07T00:38:23,492 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: densenet161.mar
2024-10-07T00:38:24,521 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model densenet161
2024-10-07T00:38:24,521 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model densenet161
2024-10-07T00:38:24,521 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model densenet161 loaded.
2024-10-07T00:38:24,521 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: densenet161, count: 1
2024-10-07T00:38:24,526 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-10-07T00:38:24,527 [DEBUG] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3, /home/lvision/.local/lib/python3.12/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000, --metrics-config, /home/lvision/.local/lib/python3.12/site-packages/ts/configs/metrics.yaml]
2024-10-07T00:38:24,568 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2024-10-07T00:38:24,568 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2024-10-07T00:38:24,569 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2024-10-07T00:38:24,570 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2024-10-07T00:38:24,570 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2024-10-07T00:38:24,720 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2024-10-07T00:38:25,152 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:20.0|#Level:Host|#hostname:lvision-MS-7C60,timestamp:1728286705
2024-10-07T00:38:25,153 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:1412.4598426818848|#Level:Host|#hostname:lvision-MS-7C60,timestamp:1728286705
2024-10-07T00:38:25,153 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:215.69704055786133|#Level:Host|#hostname:lvision-MS-7C60,timestamp:1728286705
2024-10-07T00:38:25,153 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:13.2|#Level:Host|#hostname:lvision-MS-7C60,timestamp:1728286705
2024-10-07T00:38:25,153 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:2.5472005208333335|#Level:Host,DeviceId:0|#hostname:lvision-MS-7C60,timestamp:1728286705
2024-10-07T00:38:25,153 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:626.0|#Level:Host,DeviceId:0|#hostname:lvision-MS-7C60,timestamp:1728286705
2024-10-07T00:38:25,154 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:lvision-MS-7C60,timestamp:1728286705
2024-10-07T00:38:25,154 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:248234.90625|#Level:Host|#hostname:lvision-MS-7C60,timestamp:1728286705
2024-10-07T00:38:25,154 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:6890.875|#Level:Host|#hostname:lvision-MS-7C60,timestamp:1728286705
2024-10-07T00:38:25,154 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:3.6|#Level:Host|#hostname:lvision-MS-7C60,timestamp:1728286705
2024-10-07T00:38:25,632 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9000, pid=62344
2024-10-07T00:38:25,636 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000
2024-10-07T00:38:25,637 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - Successfully loaded /home/lvision/.local/lib/python3.12/site-packages/ts/configs/metrics.yaml.
2024-10-07T00:38:25,637 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - [PID]62344
2024-10-07T00:38:25,637 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - Torch worker started.
2024-10-07T00:38:25,637 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - Python runtime: 3.12.3
2024-10-07T00:38:25,638 [DEBUG] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-densenet161_1.0 State change null -> WORKER_STARTED
2024-10-07T00:38:25,641 [INFO ] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2024-10-07T00:38:25,646 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000.
2024-10-07T00:38:25,648 [DEBUG] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1728286705648
2024-10-07T00:38:25,649 [INFO ] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1728286705649
2024-10-07T00:38:25,674 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - model_name: densenet161, batchSize: 1
2024-10-07T00:38:26,881 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - Enabled tensor cores
2024-10-07T00:38:26,882 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - OpenVINO is not enabled
2024-10-07T00:38:26,882 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-10-07T00:38:26,882 [INFO ] W-9000-densenet161_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-10-07T00:38:27,263 [WARN ] W-9000-densenet161_1.0-stderr MODEL_LOG - /home/lvision/.local/lib/python3.12/site-packages/ts/torch_handler/base_handler.py:355: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2024-10-07T00:38:27,264 [WARN ] W-9000-densenet161_1.0-stderr MODEL_LOG - state_dict = torch.load(model_pt_path, map_location=map_location)
2024-10-07T00:38:27,564 [INFO ] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1915
2024-10-07T00:38:27,564 [DEBUG] W-9000-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-densenet161_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-10-07T00:38:27,565 [INFO ] W-9000-densenet161_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:3041.0|#WorkerName:W-9000-densenet161_1.0,Level:Host|#hostname:lvision-MS-7C60,timestamp:1728286707
2024-10-07T00:38:27,565 [INFO ] W-9000-densenet161_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:2.0|#Level:Host|#hostname:lvision-MS-7C60,timestamp:1728286707


1.3.3 key_file.json

By running the above command, a key_file.json file is generated under the current working directory (Please refer to TorchServe token authorization API):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
➜  torchserve ll key_file.json
4.0K -rw------- 1 lvision lvision 243 Oct 7 00:38 key_file.json
➜ torchserve cat key_file.json
{
"management": {
"key": "c_7-MgUE",
"expiration time": "2024-10-07T08:38:23.343462778Z"
},
"inference": {
"key": "IMc5oeRf",
"expiration time": "2024-10-07T08:38:23.343456097Z"
},
"API": {
"key": "_tFv4L56"
}
}%

1.3.4 Is TorchServe Service Running?

1
2
3
4
5
6
7
8
9
➜  ~ curl -H "Authorization: Bearer c_7-MgUE" http://127.0.0.1:8081/models
{
"models": [
{
"modelName": "densenet161",
"modelUrl": "densenet161.mar"
}
]
}

1.4 Get Predictions

1.4.1 Using REST APIs

1
2
3
4
➜  torchserve curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7341 100 7341 0 0 28980 0 --:--:-- --:--:-- --:--:-- 29015

Let's take a look:

kitten

1
2
3
4
5
6
7
8
➜  torchserve curl -H "Authorization: Bearer IMc5oeRf" http://127.0.0.1:8080/predictions/densenet161 -T kitten_small.jpg
{
"tabby": 0.47793325781822205,
"lynx": 0.20019005239009857,
"tiger_cat": 0.16827784478664398,
"tiger": 0.062009651213884354,
"Egyptian_cat": 0.05115227773785591
}%

1.4.2 Using gRPC APIs through Python Client

1.4.2.1 Install gRPC Python dependencies

1
pip install -U grpcio protobuf grpcio-tools googleapis-common-protos

1.4.2.2 Generate inference client using proto files

Must under folder serve:

1
➜  torchserve cd serve

Then,

1
2
3
4
➜  serve git:(master) python -m grpc_tools.protoc --proto_path=frontend/server/src/main/resources/proto/ --python_out=ts_scripts --grpc_python_out=ts_scripts frontend/server/src/main/resources/proto/inference.proto frontend/server/src/main/resources/proto/management.proto
google/rpc/status.proto: File not found.
inference.proto:6:1: Import "google/rpc/status.proto" was not found or had errors.
inference.proto:32:14: "google.rpc.Status" is not defined.
Why????
😞😢😭

Solution::

1
2
3
4
5
6
7
8
9
➜  serve git:(master) git clone https://github.com/googleapis/googleapis.git

Cloning into 'googleapis'...
remote: Enumerating objects: 233669, done.
remote: Counting objects: 100% (13457/13457), done.
remote: Compressing objects: 100% (410/410), done.
remote: Total 233669 (delta 13122), reused 13077 (delta 13042), pack-reused 220212 (from 1)
Receiving objects: 100% (233669/233669), 205.13 MiB | 21.65 MiB/s, done.
Resolving deltas: 100% (196982/196982), done.
  • Step 2: Generate inference client using Google APIs's necessary .proto files:
1
2
3
4
5
6
7
➜  serve git:(master) ✗ python -m grpc_tools.protoc \
--proto_path=frontend/server/src/main/resources/proto/ \
--proto_path=googleapis/ \
--python_out=ts_scripts \
--grpc_python_out=ts_scripts \
frontend/server/src/main/resources/proto/inference.proto \
frontend/server/src/main/resources/proto/management.proto
  • Step 3: Modify ts_scripts/torchserve_grpc_client.py as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
import argparse
import queue
import threading
from functools import partial

import grpc
import inference_pb2
import inference_pb2_grpc
import management_pb2
import management_pb2_grpc

# Function to get an inference stub for making gRPC calls to the inference service.
def get_inference_stub():
channel = grpc.insecure_channel("localhost:7070")
stub = inference_pb2_grpc.InferenceAPIsServiceStub(channel)
return stub

# Function to get a management stub for making gRPC calls to the model management service.
def get_management_stub():
channel = grpc.insecure_channel("localhost:7071")
stub = management_pb2_grpc.ManagementAPIsServiceStub(channel)
return stub

# Perform a single inference call.
def infer(stub, model_name, model_input, metadata):
with open(model_input, "rb") as f:
data = f.read()

input_data = {"data": data}
response = stub.Predictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data),
metadata=metadata,
)

try:
prediction = response.prediction.decode("utf-8")
print(prediction)
except grpc.RpcError as e:
print(f"gRPC error: {e.details()}")
exit(1)

# Perform streaming inference.
def infer_stream(stub, model_name, model_input, metadata):
with open(model_input, "rb") as f:
data = f.read()

input_data = {"data": data}
responses = stub.StreamPredictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data),
metadata=metadata,
)

try:
for resp in responses:
prediction = resp.prediction.decode("utf-8")
print(prediction)
except grpc.RpcError as e:
print(f"gRPC error: {e.details()}")
exit(1)

# Perform an advanced streaming inference with multiple input files.
def infer_stream2(model_name, sequence_id, input_files, metadata):
response_queue = queue.Queue()
process_response_func = partial(
InferStream2.default_process_response, response_queue
)

client = InferStream2SimpleClient()
try:
client.start_stream(
model_name=model_name,
sequence_id=sequence_id,
process_response=process_response_func,
metadata=metadata,
)
sequence = input_files.split(",")

for input_file in sequence:
client.async_send_infer(input_file.strip())

for i in range(0, len(sequence)):
response = response_queue.get()
print(str(response))

print("Sequence completed!")

except grpc.RpcError as e:
print("infer_stream2 received error", e)
exit(1)
finally:
client.stop_stream()
client.stop()

# Register a new model with TorchServe.
def register(stub, model_name, mar_set_str, metadata):
mar_set = set()
if mar_set_str:
mar_set = set(mar_set_str.split(","))
marfile = f"{model_name}.mar"
print(f"## Check {marfile} in mar_set :", mar_set)
if marfile not in mar_set:
marfile = "https://torchserve.s3.amazonaws.com/mar_files/{}.mar".format(
model_name
)

print(f"## Register marfile: {marfile}\n")
params = {
"url": marfile,
"initial_workers": 1,
"synchronous": True,
"model_name": model_name,
}
try:
response = stub.RegisterModel(
management_pb2.RegisterModelRequest(**params), metadata=metadata
)
print(f"Model {model_name} registered successfully")
except grpc.RpcError as e:
print(f"Failed to register model {model_name}.")
print(str(e.details()))
exit(1)

# Unregister a model from TorchServe.
def unregister(stub, model_name, metadata):
try:
response = stub.UnregisterModel(
management_pb2.UnregisterModelRequest(model_name=model_name),
metadata=metadata,
)
print(f"Model {model_name} unregistered successfully")
except grpc.RpcError as e:
print(f"Failed to unregister model {model_name}.")
print(str(e.details()))
exit(1)

# The rest of the code defines the streaming classes and the command-line interface.

if __name__ == "__main__":
# Argument parsing for the script
parent_parser = argparse.ArgumentParser(add_help=False)
parent_parser.add_argument(
"model_name",
type=str,
default=None,
help="Name of the model used.",
)
parent_parser.add_argument(
"--auth-token",
dest="auth_token",
type=str,
default=None,
required=False,
help="Authorization token",
)

parser = argparse.ArgumentParser(
description="TorchServe gRPC client",
formatter_class=argparse.RawTextHelpFormatter,
)
subparsers = parser.add_subparsers(help="Action", dest="action")

infer_action_parser = subparsers.add_parser(
"infer", parents=[parent_parser], add_help=False
)
infer_stream_action_parser = subparsers.add_parser(
"infer_stream", parents=[parent_parser], add_help=False
)
infer_stream2_action_parser = subparsers.add_parser(
"infer_stream2", parents=[parent_parser], add_help=False
)
register_action_parser = subparsers.add_parser(
"register", parents=[parent_parser], add_help=False
)
unregister_action_parser = subparsers.add_parser(
"unregister", parents=[parent_parser], add_help=False
)

# Arguments for different actions
infer_action_parser.add_argument(
"model_input", type=str, default=None, help="Input for model for inference."
)
infer_stream_action_parser.add_argument(
"model_input",
type=str,
default=None,
help="Input for model for stream inference.",
)
infer_stream2_action_parser.add_argument(
"sequence_id",
type=str,
default=None,
help="Input for sequence id for stream inference.",
)
infer_stream2_action_parser.add_argument(
"input_files",
type=str,
default=None,
help="Comma separated list of input files",
)
register_action_parser.add_argument(
"mar_set",
type=str,
default=None,
nargs="?",
help="Comma separated list of mar models to be loaded using [model_name=]model_location format.",
)

# Parse command line arguments
args = parser.parse_args()

# Create metadata with or without the authorization token
if args.auth_token:
metadata = (
("protocol", "gRPC"),
("session_id", "12345"),
("authorization", f"Bearer {args.auth_token}"),
)
else:
metadata = (("protocol", "gRPC"), ("session_id", "12345"))

# Perform the selected action
if args.action == "infer":
infer(get_inference_stub(), args.model_name, args.model_input, metadata)
elif args.action == "infer_stream":
infer_stream(get_inference_stub(), args.model_name, args.model_input, metadata)
elif args.action == "infer_stream2":
infer_stream2(args.model_name, args.sequence_id, args.input_files, metadata)
elif args.action == "register":
register(get_management_stub(), args.model_name, args.mar_set, metadata)
elif args.action == "unregister":
unregister(get_management_stub(), args.model_name, metadata)
  • Step 4: Run inference using a sample client gRPC python client:
1
2
3
4
5
6
7
8
9
10
11
12
13
➜  serve git:(master) ✗ python ts_scripts/torchserve_grpc_client.py infer --auth-token IMc5oeRf densenet161 examples/image_classifier/kitten.jpg

/home/lvision/.local/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.2 at inference.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
warnings.warn(
/home/lvision/.local/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.2 at management.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
warnings.warn(
{
"tabby": 0.46603792905807495,
"tiger_cat": 0.4651001989841461,
"Egyptian_cat": 0.06611046195030212,
"lynx": 0.001293532201088965,
"plastic_bag": 0.000228719727601856
}