
AI Evaluation Specialist (Polish) | $15/hr Remote
Crossing Hurdles, Poland, NY, United States
Position:
LLM – AI Quality Analyst (Personalization) – Polish
Type:
Short-Term Contract
Location:
Remote (Global)
Commitment:
20-40 hours/week with 4 hours overlap with PST
Engagement Length:
1 month
Start Date:
Immediate
Role Responsibilities
Design multi-turn conversational prompts based on personal context
Evaluate personalized AI responses for relevance, grounding, and helpfulness
Assess correct and incorrect use of personal data in model outputs
Perform side-by-side (SxS) evaluation and ranking of AI responses
Identify grounding errors, poor inferences, and forced personalization
Write clear, structured rationales referencing specific conversation turns
Extract and verify model debug information and data source usage
Maintain strict data hygiene by deleting evaluation conversations
Requirements
Polish fluency (reading and writing) is mandatory, as Polish is the focus language for this project
Experience in data annotation, AI quality evaluation, content moderation, or related roles is strongly preferred
Strong analytical thinking and attention to detail
Ability to evaluate nuanced and ambiguous AI responses
Comfortable using a primary personal Google account with enabled data sources
BS/BA degree or equivalent experience in a relevant analytical field
Strong written communication and structured feedback skills
Self-motivated and able to work independently in a remote setting
Reliable desktop/laptop with stable internet connection
#J-18808-Ljbffr
LLM – AI Quality Analyst (Personalization) – Polish
Type:
Short-Term Contract
Location:
Remote (Global)
Commitment:
20-40 hours/week with 4 hours overlap with PST
Engagement Length:
1 month
Start Date:
Immediate
Role Responsibilities
Design multi-turn conversational prompts based on personal context
Evaluate personalized AI responses for relevance, grounding, and helpfulness
Assess correct and incorrect use of personal data in model outputs
Perform side-by-side (SxS) evaluation and ranking of AI responses
Identify grounding errors, poor inferences, and forced personalization
Write clear, structured rationales referencing specific conversation turns
Extract and verify model debug information and data source usage
Maintain strict data hygiene by deleting evaluation conversations
Requirements
Polish fluency (reading and writing) is mandatory, as Polish is the focus language for this project
Experience in data annotation, AI quality evaluation, content moderation, or related roles is strongly preferred
Strong analytical thinking and attention to detail
Ability to evaluate nuanced and ambiguous AI responses
Comfortable using a primary personal Google account with enabled data sources
BS/BA degree or equivalent experience in a relevant analytical field
Strong written communication and structured feedback skills
Self-motivated and able to work independently in a remote setting
Reliable desktop/laptop with stable internet connection
#J-18808-Ljbffr