Getting Started
Installation
Using pip
pip install glassgenLocal Development Installation
- Clone the repository:
git clone https://github.com/glassflow/glassgen.git
cd glassgen- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate- Install the package in development mode:
pip install -e .- Install development dependencies:
pip install -r requirements-dev.txt- Run tests to verify installation:
pytestBasic Usage
Python SDK
Here’s a simple example of using GlassGen to generate user data and save it to a CSV file:
import glassgen
config = {
"schema": {
"name": "$name",
"email": "$email",
"country": "$country",
"id": "$uuid",
"address": "$address",
"phone": "$phone_number",
"job": "$job",
"company": "$company"
},
"sink": {
"type": "csv",
"params": {
"path": "output.csv"
}
},
"generator": {
"rps": 1500,
"num_records": 5000
}
}
result = glassgen.generate(config=config)
# result is a dict: {"time_taken_ms": ..., "num_records": ..., "sink": "csv"}Configuration File
You can also load the configuration from a JSON file:
import glassgen
import json
with open("config.json") as f:
config = json.load(f)
glassgen.generate(config=config)CLI
glassgen generate-data --config config.jsongenerate() Reference
glassgen.generate(config, schema=None, sink=None)config(required): A configuration dict orGlassGenConfigobject.schema(optional): A customBaseSchemainstance. When provided, theschemablock inconfigis ignored.sink(optional): ABaseSinkinstance or a sink config dict. When provided, thesinkblock inconfigis ignored.
Return value:
- For all sinks except
yield: returns adictwithtime_taken_ms,num_records, andsink. - For the
yieldsink: returns a generator that yields one record at a time. See the Yield Sink page for details.
generate_one() Reference
To generate a single record without any sink or generator config, use generate_one():
import glassgen
record = glassgen.generate_one({
"id": "$uuid",
"name": "$name",
"email": "$email"
})
# {"id": "...", "name": "...", "email": "..."}Generator Configuration
The generator block controls how records are produced:
{
"generator": {
"rps": 1000,
"num_records": 5000,
"bulk_size": 5000
}
}| Field | Default | Description |
|---|---|---|
rps | 0 | Target records per second. 0 means generate as fast as possible with no rate limiting. Values above 2500 also skip rate limiting. |
num_records | 100 | Total records to generate. Set to -1 for infinite generation. |
bulk_size | 5000 | Internal batch size. Tune this to adjust memory usage vs. throughput. |
Infinite generation
Set num_records to -1 to generate records indefinitely (useful with the yield sink or streaming sinks):
config = {
"schema": {"id": "$uuid", "value": "$int"},
"sink": {"type": "yield"},
"generator": {"rps": 100, "num_records": -1}
}
for record in glassgen.generate(config=config):
process(record)Event Duplication
GlassGen can simulate real-world data streams that contain duplicate events. Configure it under generator.event_options:
{
"generator": {
"rps": 1000,
"num_records": 10000,
"event_options": {
"duplication": {
"enabled": true,
"ratio": 0.1,
"key_field": "id",
"time_window": "1h"
}
}
}
}| Field | Required | Description |
|---|---|---|
enabled | yes | Turn duplication on or off. |
ratio | yes | Fraction of records that will be duplicates (0–1). 0.1 means ~10% duplicates. |
key_field | yes | The schema field used to identify a record for duplication. Must exist in the schema. |
time_window | no (default 1h) | How far back to look when picking a record to duplicate. Supports s, m, h, d suffixes. |
Next Steps
- Learn about available generators
- Explore sink options