Skip to content

Advanced Features

Advanced Features

Unlock the full potential of the TSG Ecosystem

This guide explores advanced capabilities of the TSG Ecosystem for power users who need to perform complex operations with transcript data.

Advanced Graph Operations

Custom Connection Types

TSG allows you to define custom connection types between segments to model various relationships:

# Define semantic connections with custom properties
graph.add_connection(
    source="seg_012",
    target="seg_045",
    connection_type="semantic_similarity",
    weight=0.92,
    properties={
        "similarity_metric": "cosine",
        "embedding_model": "sentence-bert",
        "topics": ["research", "methodology", "results"]
    }
)

Batch Operations

Performance Optimization

For large datasets, use batch operations to efficiently process multiple segments or connections.

# Batch add segments
segments_data = [
    {
        "id": "seg_101",
        "start_time": 120.5,
        "end_time": 125.2,
        "text": "Let's discuss the experimental setup.",
        "speaker": "Dr. Smith"
    },
    {
        "id": "seg_102",
        "start_time": 125.5,
        "end_time": 134.8,
        "text": "We used a controlled environment with three variable conditions.",
        "speaker": "Dr. Smith"
    },
    # More segments...
]

graph.add_segments_batch(segments_data)

Graph Transformation

# Create a subgraph with only specific speakers
filtered_graph = graph.filter_by_property(
    property_name="speaker",
    property_values=["Dr. Smith", "Dr. Johnson"]
)
# Merge two different transcript graphs
combined_graph = TSGraph.merge(
    [interview_graph, presentation_graph],
    strategy="sequential"
)
# Split a graph by time segments
segments = graph.split_by_time(
    intervals=[300, 600, 900],  # Split at 5 min, 10 min, 15 min
    overlap=10  # 10 second overlap between segments
)

Visualization Techniques

Integration with Visualizer

The examples below require the tsgeco.visualizer module.

Interactive Network Visualization

from tsgeco.visualizer import NetworkVisualizer

# Create an interactive network visualization
viz = NetworkVisualizer(graph)
viz.set_node_coloring(attribute="speaker")
viz.set_edge_styling(attribute="connection_type")
viz.add_tooltips(["text", "start_time", "end_time"])

# Generate and save interactive HTML visualization
viz.render("transcript_network.html")

Timeline Visualization

Speaker Timeline

Visualize speaker turns over time

from tsgeco.visualizer import TimelineVisualizer

timeline = TimelineVisualizer(graph)
timeline.render_speaker_timeline("speaker_timeline.html")

Topic Heatmap

Visualize topic intensity over time

from tsgeco.visualizer import TopicHeatmapVisualizer

heatmap = TopicHeatmapVisualizer(graph)
heatmap.extract_topics(method="lda", num_topics=5)
heatmap.render("topic_heatmap.html")

Advanced Analytics

Topic Modeling

from tsgeco.analytics import TopicModeler

# Extract topics from transcript segments
modeler = TopicModeler(graph)
topics = modeler.extract_topics(
    method="lda",
    num_topics=8,
    remove_stopwords=True,
    min_segment_length=10
)

# Apply topics to segments
modeler.apply_topics_to_segments()

# Find segments discussing a specific topic
ai_segments = graph.get_segments_by_property(
    property_name="topics",
    property_value_contains="artificial intelligence"
)

Sentiment Analysis

Tracking Sentiment

from tsgeco.analytics import SentimentAnalyzer

analyzer = SentimentAnalyzer(graph)
analyzer.analyze()

# Get sentiment scores for each segment
scores = analyzer.get_sentiment_scores()

# Plot sentiment over time
analyzer.plot_sentiment_timeline("sentiment_timeline.png")

# Find segments with negative sentiment
negative_segments = analyzer.get_segments_by_sentiment(
    threshold=0.3,
    sentiment_type="negative"
)

Speaker Diarization Refinement

from tsgeco.analytics import SpeakerDiarization

# Refine speaker labels
diarizer = SpeakerDiarization(graph)
diarizer.load_voice_embeddings("speaker_embeddings.pkl")
diarizer.refine_speaker_labels(method="clustering")

# Get segments with corrected speaker labels
refined_graph = diarizer.get_refined_graph()

Advanced Integration Options

Real-time Processing

Streaming API

Process transcripts in real-time as they are generated

from tsgeco.streaming import TSGStreamProcessor

# Set up a real-time processor
processor = TSGStreamProcessor()

# Define callback function for new segments
def on_new_segment(segment):
    print(f"New segment: {segment['text']}")
    # Process segment in real-time...

# Start processing from streaming source
processor.connect_to_stream(
    source="websocket",
    url="wss://transcript-stream.example.com/ws",
    on_segment=on_new_segment
)

Custom Plugins

from tsgeco.plugins import TSGPlugin

class CustomAnalyzer(TSGPlugin):
    def __init__(self, name="custom_analyzer"):
        super().__init__(name)

    def process(self, graph):
        # Custom analysis logic here
        return modified_graph

    # Register custom commands
    def get_commands(self):
        return {
            "analyze_patterns": self.analyze_patterns,
            "extract_entities": self.extract_entities
        }

# Register and use the plugin
plugin = CustomAnalyzer()
graph.register_plugin(plugin)
results = graph.run_plugin_command("custom_analyzer", "analyze_patterns")

Performance Optimization

Large Dataset Considerations

These techniques are essential when working with very large transcript datasets.

Chunked Processing

Process large graphs in manageable chunks

from tsgeco import GraphChunker

chunker = GraphChunker(graph)
chunks = chunker.split(chunk_size=1000)  # segments per chunk

for chunk in chunks:
    process_chunk(chunk)  # Custom processing

Parallel Processing

Distribute graph operations across multiple cores

from tsgeco.parallel import ParallelProcessor

processor = ParallelProcessor(max_workers=8)
results = processor.process_graph(
    graph,
    operation=my_custom_operation,
    merge_strategy="concat"
)

Next Steps

Now that you're familiar with the advanced features of the TSG Ecosystem, you might want to:

  • Explore our API Reference for detailed documentation
  • Join our Community to share your use cases and contribute to the project