Generative AI tools such as GPT-4o are now widely accessible, but there is still limited experimental evidence on how they affect real knowledge work across diverse organisational roles. While existing studies show that AI can improve performance in narrowly defined tasks – such as writing short memos or generating code – we know less about its impact on complex institutional work involving varied task types, levels of specialisation, and departmental contexts.
To address this, we partnered with the National Bank of Slovakia (NBS) to evaluate the effect of GPT-4o in a structured field experiment. NBS is the country’s central bank and a member of the Eurosystem, and is responsible for monetary policy implementation, financial supervision, and economic analysis. We recruited 101 staff members across departments ranging from research and monetary policy to IT, supervision, and operations. Each participant was asked to complete a two-hour battery of tasks that closely mirrored their day-to-day work and the task-based frameworks from Autor et al. (2003) and Autor (2013).
The experiment consisted of two components:
- Fourteen general tasks covering a wide range of activity types (writing, editing, data classification, analysis) randomly assigned to be completed with or without GPT-4o.
- Two domain-specialised tasks, tailored to the participant’s department, again with randomised AI access.
This experimental design allowed us to cleanly estimate how access to generative AI affects task performance across a broad spectrum of task categories.
AI improves both quality and speed
Access to GPT-4o led to large average productivity gains across the board. Task quality improved by 33% to 44%, while task completion time fell by 21%. Nearly all participants (94%) produced higher-quality outputs with AI, and a large majority (80%) completed tasks more quickly.
Yet the nature of these gains differed by worker skill level:
- Lower-skill participants benefited most in output quality. With AI, they were able to produce substantially stronger work, in some cases catching up to the baseline performance of higher-skill colleagues.
- Higher-skill participants experienced the largest time savings, using GPT-4o to complete already high-quality tasks more efficiently.
To see how these individual gains are distributed across our sample, Figure 1 plots kernel-density estimates of each participant’s AI treatment effect on quality (panel a) and efficiency (panel b) and for (c) for quality on domain-specific tasks.
Figure 1 Distribution of GPT-4o productivity gains
Panel (a) Quality effects on generalist tasks
Panel (b): Time-saving effects on generalist tasks
Panel (c) Quality effects on specialist tasks
While recent research suggests that generative AI disproportionately benefits lower-skill workers (Brynjolfsson et al. 2023, Dell’Acqua et al. 2023, Noy and Zhang 2023), our findings paint a more nuanced picture. We find that lower-skill employees do experience large gains in quality – but higher-skill employees benefit more in terms of speed. Generative AI may thus act as both a quality equaliser and an efficiency multiplier, raising important questions about how organisations structure work, training, and performance evaluation in the AI era.
AI complements non-routine and specialist tasks
While panels (a) and (b) of Figure 1 show that nearly everyone benefits, panel (c) reveals a longer right-tail on specialist tasks, highlighting that the biggest payoffs occur when AI meets deep domain expertise. To understand where AI helps most, we classified each task using a framework grounded in Autor’s (2013) taxonomy and subsequent work in task-based economics. Tasks varied along several dimensions:
- Routineness: Did the task follow a predictable, structured pattern (e.g. proofreading), or did it require creativity, synthesis, or judgement (e.g. drafting a stakeholder memo)?
- Specialisation: Did the task require domain-specific knowledge (e.g. monetary policy, banking supervision, IT systems), or was it more general in nature?
- Cognitive complexity: Did the task require higher-order reasoning, or was it more mechanical?
Our results across these tasks were striking:
- On routine tasks, GPT-4o improved outcomes by 24%.
- On non-routine tasks, performance improved by 58%, a difference of more than one standard deviation.
- On specialised tasks tailored to the employee’s domain, AI lifted performance by 100–117%, more than double the 36–50% gains seen in generalist tasks.
These patterns are consistent with the idea that AI is most productive when paired with cognitive complexity and expert context – not when used to automate simple or repetitive tasks.
Table 1 Average performance by task type and treatment condition
Task-level returns and worker-task mismatches
While GPT-4o delivered the largest productivity gains on non-routine and specialist tasks, these tasks were not always assigned to the workers who benefited most from AI. For example, employees in more routine job roles experienced the biggest individual quality improvements from using AI, but often worked on tasks that generated relatively modest returns at the task level.
This mismatch between who gains most from AI and where AI is most productive created a matching inefficiency.
To explore this, we simulated a counterfactual reallocation. Keeping staffing and total workload fixed, we reassigned tasks based on each worker’s AI-enhanced comparative advantage – that is, assigning individuals to the task types where their performance improved most with GPT-4o.
The result: aggregate output increased by 7.3%, without changing headcount or effort.
This highlights a key managerial insight: adopting AI is only the first step. To unlock its full value, organizations must also reconsider how tasks are assigned – matching tools, people, and work more intentionally.
Figure 2 Production possibility frontier, with and without generative AI
This highlights a key managerial insight: adopting AI tools is not enough, organisations must also reconsider who does what in order to fully capture the technology’s potential.
Five lessons for firms and institutions
Our findings point to several actionable lessons for organisations aiming to integrate generative AI into knowledge work:
- Aim AI at its sweet-spot tasks. Generative models deliver their biggest pay-offs on non-routine, cognitively demanding work (writing, synthesis and domain-specific analysis), often doubling performance relative to routine chores. Map where those tasks sit in your workflow and make them the first ports of call for AI deployment.
- Give everyone the tool, guide them differently. Access should be universal, yet support must vary. Lower-skill staff mostly need quality-boosting prompts, while higher-skill colleagues benefit from efficiency hacks. Tailored training keeps both groups on the productivity frontier.
- Rewire task allocation, not just software. Because workers in routine roles enjoy the largest personal gains but their tasks gain the least from AI, organisations might want to re-assign people toward the activities where the technology magnifies their comparative advantage.
- Treat AI as infrastructure and update incentives. Fold model use into performance metrics, career paths and team design – much as previous generations did with spreadsheets and email. Without these complementary adjustments, a quarter of the aggregate productivity dividend remains untapped.
- Balance speed with safeguards. Faster completion times shrink the natural buffer before deadlines and can tempt over-confidence in AI-generated text. Instituting rigorous validation protocols (peer review, red-team checks, automated fact-verification) protects quality as reliance on AI grows and delivery expectations tighten
As generative AI tools become embedded in everyday work, the challenge shifts from adoption to integration. Realising the full value of generative AI will depend not just on the technology itself, but on how well organisations align it with human capital and task design.
References
Autor, D H (2013), “The “task approach” to labor markets: an overview,” Journal of Labour Market Research 46(3): 185–199.
Autor, D H, F Levy, and R J Murnane (2003), “The Skill Content of Recent Technological Change: An Empirical Exploration,” The Quarterly Journal of Economics 118(4): 1279–1333.
Brynjolfsson, E, D Li, and L R Raymond (2023), “Generative AI at Work,” NBER Working Paper 31161.
Dell’Acqua, F, E III McFowland, E Mollick et al. (2023), “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality,” Working Paper, Harvard Business School.
Maršál, A and P Perkowski (2025), “A task-based approach to generative AI: Evidence from a field experiment in central banking,” NBS Working Paper.
Noy, S and W Zhang (2023), “Experimental evidence on the productivity effects of generative artificial intelligence,” Science 381(6654): 187–192.