Skip to main content
ACM Platforms

Multi-modal AI tutoring • Text · Image · Draw · Voice · Video

Learn your way: text, talk, show, or stream

Get personalized help through chat, homework photos, natural voice, or video screen capture demonstrations—one intelligent tutor that keeps context across every modality.

Interactive preview

Try all five inputs—session:

No backend required • Mocks for marketing demo

Sample taps

AI response

Tutor

Hi! I'm your ACM tutor preview. Pick a mode below—I'll remember context as you switch.

The multi-modal learning revolution

Traditional tutoring rarely matches how students actually work: some need to talk it out, others need to show their scratch work, and many switch modes minute by minute. ACM unifies those paths with AI that keeps the full story straight.

Why now

Learners are 40% faster when they can choose input modalities that match the task (internal beta data, illustrative). Pair that with 24/7 availability and you remove the friction of “wrong format, wrong time.”

Input channels working together—not siloed chatbots.

How our interactive AI works

Deployed on AWS AI Technology Stack. AWS hosted models, Agent orchestration and memory services persist context across modalities. Below is the same architecture we walk through with IT and curriculum teams.

1

Student inputs

  • Text: rich chat, LaTeX, code
  • Image: camera / upload → Claude Vision (Bedrock)
  • Draw: sketch pad → PNG export → vision (whiteboard & scratch work)
  • Voice: mic → AWS speech engine pipeline
  • Video: stream → adaptive frame sampling
2

Processing layer

  • Multi-modal router (rules + LLM classification)
  • Context manager (cross-modal memory)
  • Multi-Models routing: by complexity
3

AI processing

  • Vision for images & video frames
  • Language & reasoning
  • Speech recognition & synthesis
  • Tool use & explanations
4

Intelligent output

  • Formatted text & math
  • Annotations on visuals
  • Natural voice replies
  • Interactive practice

Analogies we use with educators: the tutor doesn’t just read your message—it recalls the diagram you showed two turns ago, even if you switch to voice.

Five ways to learn

Each mode is first-class—not an afterthought bolted onto chat.

Text-based tutoring

Type questions, paste problems, export threads. Great for quick checks, code, and step-by-step math.

LaTeX math, syntax highlighting, threaded follow-ups.

Image-based learning

Snap homework, diagrams, or whiteboards—the tutor reads handwriting and structure, not just pixels.

Powered by AWS hosted models; Textract-ready for scans.

Drawing pad & sketches

Work out math, circuits, or concepts freehand—export strokes as an image for the same vision pipeline as uploads.

PNG export + pressure-friendly pointers; ideal when a camera is awkward.

Voice conversations

Talk naturally for language practice, accessibility, or hands-free study sessions.

AWS robust natural speech engine; transcripts stay in your session history.

Video tutoring

Demonstrate labs, proofs, or presentations while the tutor samples frames intelligently.

Adaptive 1–2 fps sampling + key-moment detection for feedback.

AI that remembers across modalities

Example flow: sketch a figure on the pad or upload a photo → read a text explanation → ask a follow-up by voice. The tutor answers with awareness of what you drew or showed, not a blank slate.

  1. Step 1

    Image in

  2. Step 2

    Text explain

  3. Step 3

    Voice follow-up

  4. Step 4

    Unified context

Interactive learning scenarios

Math homework helper

  • Stuck on calculus • photo the page or sketch the problem on the pad
  • Tutor explains with steps
  • Ask “why this move?” by voice
  • Tutor references the same figure or sketch

Language learning

  • Practice Spanish aloud
  • Get pronunciation guidance
  • Type a grammar question
  • Upload written essay snapshot for feedback

Science lab

  • Record experiment on video
  • Receive safety + procedure nudges
  • Text about odd results
  • Vision checks frames for setup issues

Test preparation

  • Upload practice test photo
  • Work through prompts in chat
  • Discuss hard topics via voice
  • Generate parallel drills automatically

Platform capabilities

  • Adaptive learning paths with difficulty that responds to performance.
  • Real-time analytics: modality mix, engagement, concept mastery.
  • Recommendations for follow-up resources and practice sets.
  • Session memory: images, drawings, transcripts, and chat in one thread.
  • 100+ languages in text, 50+ in voice workflows (configurable).
  • Accessibility: screen-reader friendly chrome, captions, voice-first flows.

Trusted by curious, ambitious learners

I learn math best by showing my work—the AI sees my mistakes instantly.

Sarah, 16

Image + text

Voice tutoring lets me practice Spanish pronunciation whenever I have five minutes.

Marcus, 22

Voice

Between classes I can text a fast question and pick up the thread later with a photo.

Priya, 19

Text

85%

Students prefer multi-modal vs. text-only tutoring (survey, illustrative).

92%

Report better retention when visual explanations accompany text (pilot).

Engagement lift when voice is enabled alongside chat (beta cohort).

Logos for partner schools and enterprises ship here—placeholder until brand assets are cleared.

Plans preview

Every tier includes all five interaction channels (text, image, draw, voice, video)—limits scale with audience size and analytics depth.

Monthly

$19.99/mo

Individuals with generous multimodal quotas.

Compare features

Classroom

$299/mo

Up to 30 seats, teacher dashboard, exports.

Compare features

Enterprise

Custom

SSO, LMS hooks, APIs, SLAs, data residency options.

Compare features

Start your free interactive session

Experience every modality without a credit card. Want a white-glove walkthrough for your school or L&D org? Book time on the contact page.