Turkish NLP with BERTürk: Fine-tuning for Sentiment Analysis

            HEZARTECH Project Overview: As team leader, I guided our high school team to become TEKNOFEST 2024 National Finalists by developing an advanced Turkish NLP model for target-based sentiment analysis. We fine-tuned BERTürk on a custom dataset of 37,000+ Turkish customer reviews, achieving ~93% accuracy and F1 score.
        

The Challenge: Turkish Language NLP

Natural Language Processing for Turkish presents unique challenges due to the language's agglutinative nature, rich morphology, and complex grammar rules. Unlike English, Turkish words can take numerous suffixes, creating virtually unlimited word forms that make traditional NLP approaches struggle.

"Bu ürünü kesinlikle tavsiye etmiyorum çünkü kalitesi çok düşük ve müşteri hizmetleri berbat!"

Analysis needed: Product → Negative, Quality → Negative, Customer Service → Negative

37K+

Turkish Reviews

93%

F1 Score

Target Categories

1st

HS Team in Finals

Why Target-Based Sentiment Analysis?

Traditional sentiment analysis provides an overall sentiment for entire text, but real-world applications need more granular insights. In e-commerce, a customer might love a product's design but hate its delivery speed. Target-based sentiment analysis identifies specific aspects mentioned in text and determines sentiment for each.

The HEZARTECH Approach

Our team identified five key targets that matter most in Turkish e-commerce:

Product Quality (Ürün Kalitesi): Material, build quality, durability
Price (Fiyat): Value for money, pricing concerns
Shipping (Kargo): Delivery speed, packaging quality
Customer Service (Müşteri Hizmetleri): Support quality, responsiveness
Overall Experience (Genel Deneyim): General satisfaction

Dataset Creation: The Foundation

Creating a high-quality Turkish sentiment analysis dataset was our first major challenge. We needed authentic, diverse Turkish text with accurate target-based annotations.

Data Collection Strategy

We collected reviews from multiple Turkish e-commerce platforms to ensure diversity:

            
# Data collection pipeline
def collect_turkish_reviews():
    sources = [
        'hepsiburada.com',
        'trendyol.com', 
        'amazon.com.tr',
        'gittigidiyor.com'
    ]
    
    total_reviews = 0
    for source in sources:
        reviews = scrape_reviews(source, limit=10000)
        clean_reviews = preprocess_turkish(reviews)
        total_reviews += len(clean_reviews)
        
    return total_reviews  # 37,000+ reviews
            
        

Annotation Process

The most time-consuming part was manual annotation. Our team of 5 members labeled each review, identifying targets and their corresponding sentiments. We developed strict annotation guidelines to ensure consistency.

Annotation Interface & Guidelines

BERTürk: The Turkish BERT

BERTürk is a Turkish version of BERT (Bidirectional Encoder Representations from Transformers), pre-trained on Turkish Wikipedia and other Turkish corpora. It understands Turkish language nuances better than multilingual models.

Why BERTürk Over Other Models?

We evaluated several approaches before settling on BERTürk:

Multilingual BERT

78%

Good general performance but lacks Turkish-specific understanding

Turkish FastText

82%

Fast but struggles with context and complex sentences

BERTürk (Our Choice)

93%

Best contextual understanding and Turkish language support

Fine-tuning Process

Fine-tuning BERTürk for our specific task required careful consideration of hyperparameters, training strategy, and evaluation metrics.

Model Architecture

We modified BERTürk's output layer to handle multi-label classification, as a single review could contain multiple targets with different sentiments.

            
import torch
from transformers import AutoModel, AutoTokenizer

class TurkishSentimentClassifier(torch.nn.Module):
    def __init__(self, n_targets=5, n_classes=3):
        super().__init__()
        self.berturk = AutoModel.from_pretrained('dbmdz/bert-base-turkish-cased')
        self.dropout = torch.nn.Dropout(0.3)
        self.classifier = torch.nn.Linear(768, n_targets * n_classes)
        
    def forward(self, input_ids, attention_mask):
        outputs = self.berturk(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs.pooler_output
        output = self.dropout(pooled_output)
        return self.classifier(output).view(-1, 5, 3)  # 5 targets, 3 sentiments
            
        

Training Configuration

After extensive hyperparameter tuning, we found the optimal configuration:

Learning Rate: 2e-5 with linear decay
Batch Size: 16 (limited by GPU memory)
Epochs: 4 (prevented overfitting)
Optimizer: AdamW with weight decay
Loss Function: Cross-entropy with class weighting

Handling Turkish Language Complexities

Turkish language presented several unique challenges that required special handling:

1. Agglutination

Turkish words can take multiple suffixes, creating very long words:

"Mağazalarımızdakilerden" = "Mağaza-lar-ımız-da-ki-ler-den"
(From those that are in our stores)

2. Informal Language & Slang

Online reviews often contain informal Turkish, abbreviations, and internet slang that required special preprocessing:

            
def preprocess_turkish_text(text):
    # Handle common Turkish internet abbreviations
    replacements = {
        'mq': 'mağaza',
        'ürn': 'ürün',
        'müq': 'mükemmel',
        'sb': 'süper',
        'krg': 'kargo'
    }
    
    for abbr, full in replacements.items():
        text = text.replace(abbr, full)
    
    # Normalize Turkish characters
    text = normalize_turkish_chars(text)
    return text
            
        

3. Negation Handling

Turkish negation can significantly change meaning, requiring careful attention during training:

"Kalitesi iyi değil" (Quality is not good) ≠ "Kalitesi iyi" (Quality is good)

Results and Evaluation

Our final model achieved impressive results across all target categories:

93.2%

Overall F1 Score

94.1%

Product Quality

91.8%

Customer Service

92.5%

Shipping

Error Analysis

We conducted thorough error analysis to understand model limitations:

Sarcasm: Still challenging for the model to detect Turkish sarcasm
Mixed Sentiments: Complex sentences with contradictory sentiments
Implicit Targets: When targets are implied rather than explicitly mentioned
Regional Dialects: Different Turkish dialects occasionally confused the model

TEKNOFEST Competition Experience

Competing at TEKNOFEST as a high school team against university-level competitors was both challenging and rewarding. Our presentation focused on the practical applications and technical innovations of our approach.

            Judge Feedback: "Impressive work for a high school team. The focus on Turkish language specifics and practical e-commerce applications demonstrates mature understanding of both NLP challenges and business needs."
        

Key Success Factors

Team Collaboration: Clear role division and regular progress meetings
Iterative Development: Continuous model improvement and testing
Domain Knowledge: Understanding Turkish e-commerce landscape
Technical Excellence: Rigorous evaluation and error analysis

Real-World Applications

Our Turkish sentiment analysis model has several practical applications:

E-commerce Platforms: Automated review analysis for sellers
Customer Service: Priority routing based on sentiment
Product Development: Identifying improvement areas from feedback
Marketing Analytics: Understanding customer perception
Competitive Analysis: Monitoring brand sentiment vs competitors

Technical Challenges & Solutions

1. Data Imbalance

Natural review distributions were heavily skewed toward positive sentiments. We addressed this through:

Weighted loss functions
Oversampling negative examples
Data augmentation for underrepresented classes

2. Multi-label Classification

A single review could discuss multiple targets with different sentiments, requiring specialized output handling and evaluation metrics.

3. Computational Resources

Training large transformer models required optimization:

            
# Memory optimization techniques
def optimize_training():
    # Gradient accumulation for larger effective batch size
    accumulation_steps = 4
    
    # Mixed precision training
    scaler = torch.cuda.amp.GradScaler()
    
    # Gradient checkpointing
    model.gradient_checkpointing_enable()
    
    return model, scaler
            
        

Future Improvements

Based on our TEKNOFEST experience and continued research, we identified several enhancement opportunities:

Larger Dataset: Expanding to 100K+ reviews for better generalization
More Targets: Adding categories like design, usability, and brand perception
Cross-Domain Training: Extending beyond e-commerce to social media and news
Real-time Processing: Optimizing for production deployment
Multilingual Support: Adding Kurdish and Arabic support for Turkish market

Lessons Learned

This project taught valuable lessons about NLP research, team leadership, and competition preparation:

Technical Insights

Data Quality > Quantity: Well-annotated small datasets outperform large noisy ones
Language-Specific Models: Native language models always outperform multilingual ones
Evaluation Metrics: Choose metrics that align with real-world usage

Team Leadership

Clear Communication: Regular team meetings and progress tracking
Skill Utilization: Leveraging each team member's strengths
Deadline Management: Balancing perfectionism with competition deadlines

Open Source Contribution

To give back to the Turkish NLP community, we've open-sourced several components:

Turkish text preprocessing utilities
Target extraction tools
Evaluation scripts and metrics
Annotation guidelines and examples

Conclusion

Leading the HEZARTECH team to TEKNOFEST finals was an incredible experience that combined technical innovation with practical problem-solving. Our Turkish sentiment analysis model not only achieved competitive performance but also addressed real challenges in the Turkish e-commerce market.

The project demonstrated that with proper focus on language-specific challenges, domain expertise, and rigorous methodology, even high school students can contribute meaningful research to the NLP field. This experience strengthened my passion for AI research and confirmed my commitment to advancing Turkish language technologies.

            Interested in Turkish NLP? I'm always excited to discuss Natural Language Processing, especially for Turkish and other low-resource languages. Feel free to reach out if you're working on similar problems or want to collaborate on Turkish language technologies.