Advanced Features¶
Advanced Features
Unlock the full potential of the TSG Ecosystem
This guide explores advanced capabilities of the TSG Ecosystem for power users who need to perform complex operations with transcript data.
Advanced Graph Operations¶
Custom Connection Types¶
TSG allows you to define custom connection types between segments to model various relationships:
# Define semantic connections with custom properties
graph.add_connection(
source="seg_012",
target="seg_045",
connection_type="semantic_similarity",
weight=0.92,
properties={
"similarity_metric": "cosine",
"embedding_model": "sentence-bert",
"topics": ["research", "methodology", "results"]
}
)
Batch Operations¶
Performance Optimization
For large datasets, use batch operations to efficiently process multiple segments or connections.
# Batch add segments
segments_data = [
{
"id": "seg_101",
"start_time": 120.5,
"end_time": 125.2,
"text": "Let's discuss the experimental setup.",
"speaker": "Dr. Smith"
},
{
"id": "seg_102",
"start_time": 125.5,
"end_time": 134.8,
"text": "We used a controlled environment with three variable conditions.",
"speaker": "Dr. Smith"
},
# More segments...
]
graph.add_segments_batch(segments_data)
Graph Transformation¶
Visualization Techniques¶
Integration with Visualizer
The examples below require the tsgeco.visualizer
module.
Interactive Network Visualization¶
from tsgeco.visualizer import NetworkVisualizer
# Create an interactive network visualization
viz = NetworkVisualizer(graph)
viz.set_node_coloring(attribute="speaker")
viz.set_edge_styling(attribute="connection_type")
viz.add_tooltips(["text", "start_time", "end_time"])
# Generate and save interactive HTML visualization
viz.render("transcript_network.html")
Timeline Visualization¶
Speaker Timeline
Visualize speaker turns over time
from tsgeco.visualizer import TimelineVisualizer
timeline = TimelineVisualizer(graph)
timeline.render_speaker_timeline("speaker_timeline.html")
Topic Heatmap
Visualize topic intensity over time
from tsgeco.visualizer import TopicHeatmapVisualizer
heatmap = TopicHeatmapVisualizer(graph)
heatmap.extract_topics(method="lda", num_topics=5)
heatmap.render("topic_heatmap.html")
Advanced Analytics¶
Topic Modeling¶
from tsgeco.analytics import TopicModeler
# Extract topics from transcript segments
modeler = TopicModeler(graph)
topics = modeler.extract_topics(
method="lda",
num_topics=8,
remove_stopwords=True,
min_segment_length=10
)
# Apply topics to segments
modeler.apply_topics_to_segments()
# Find segments discussing a specific topic
ai_segments = graph.get_segments_by_property(
property_name="topics",
property_value_contains="artificial intelligence"
)
Sentiment Analysis¶
Tracking Sentiment
from tsgeco.analytics import SentimentAnalyzer
analyzer = SentimentAnalyzer(graph)
analyzer.analyze()
# Get sentiment scores for each segment
scores = analyzer.get_sentiment_scores()
# Plot sentiment over time
analyzer.plot_sentiment_timeline("sentiment_timeline.png")
# Find segments with negative sentiment
negative_segments = analyzer.get_segments_by_sentiment(
threshold=0.3,
sentiment_type="negative"
)
Speaker Diarization Refinement¶
from tsgeco.analytics import SpeakerDiarization
# Refine speaker labels
diarizer = SpeakerDiarization(graph)
diarizer.load_voice_embeddings("speaker_embeddings.pkl")
diarizer.refine_speaker_labels(method="clustering")
# Get segments with corrected speaker labels
refined_graph = diarizer.get_refined_graph()
Advanced Integration Options¶
Real-time Processing¶
Streaming API
Process transcripts in real-time as they are generated
from tsgeco.streaming import TSGStreamProcessor
# Set up a real-time processor
processor = TSGStreamProcessor()
# Define callback function for new segments
def on_new_segment(segment):
print(f"New segment: {segment['text']}")
# Process segment in real-time...
# Start processing from streaming source
processor.connect_to_stream(
source="websocket",
url="wss://transcript-stream.example.com/ws",
on_segment=on_new_segment
)
Custom Plugins¶
from tsgeco.plugins import TSGPlugin
class CustomAnalyzer(TSGPlugin):
def __init__(self, name="custom_analyzer"):
super().__init__(name)
def process(self, graph):
# Custom analysis logic here
return modified_graph
# Register custom commands
def get_commands(self):
return {
"analyze_patterns": self.analyze_patterns,
"extract_entities": self.extract_entities
}
# Register and use the plugin
plugin = CustomAnalyzer()
graph.register_plugin(plugin)
results = graph.run_plugin_command("custom_analyzer", "analyze_patterns")
Performance Optimization¶
Large Dataset Considerations
These techniques are essential when working with very large transcript datasets.
Chunked Processing
Process large graphs in manageable chunks
from tsgeco import GraphChunker
chunker = GraphChunker(graph)
chunks = chunker.split(chunk_size=1000) # segments per chunk
for chunk in chunks:
process_chunk(chunk) # Custom processing
Parallel Processing
Distribute graph operations across multiple cores
from tsgeco.parallel import ParallelProcessor
processor = ParallelProcessor(max_workers=8)
results = processor.process_graph(
graph,
operation=my_custom_operation,
merge_strategy="concat"
)
Next Steps¶
Now that you're familiar with the advanced features of the TSG Ecosystem, you might want to:
- Explore our API Reference for detailed documentation
- Join our Community to share your use cases and contribute to the project