approximate nearest neighbor search (anns) is now widely used in various applications including information retrieval, question answering, and recommendation. as the amount of vector data grows continuously, it becomes important to support updates to vector index, the enabling technique that allows for efficient and accurate anns on vectors. because of the curse of high dimensionality, it is often costly to identify the right neighbors of a single new vector, a necessary process for index update. to amortize update costs, existing systems maintain a secondary index to accumulate updates, which are merged by the main index by global rebuilding the entire index periodically. however, this approach has high fluctuations of search latency and accuracy, not even to mention that it requires substantial resources and is extremely time-consuming for rebuilds. we introduce spfresh, a system that supports in-place vector updates. at the heart of spfresh is lire, a lightweight incremental rebalancing protocol to split vector partitions and reassign vectors in the nearby partitions to adapt to data distribution shift. lire achieves low-overhead vector updates by only reassigning vectors at the boundary between partitions, where in a high-quality vector index the amount of such vectors are deemed small. with lire, spfresh provides superior query latency and accuracy to solutions based on global rebuild, with only 1% of dram and less than 10% cores needed at the peak compared to the state-of-the-art, in a billion scale disk-based vector index with a 1% of daily vector update rate. in this paper, we present brainstorm, a deep learning framework for optimizing dynamic nns, which bridges the gap by unifying how dynamism should be expressed. brainstorm proposes (1) cell, the key data abstraction that lets model developers express the data granularity where dynamism exists, and (2) router, a unified interface to let model developers express how cells should be dynamically dispatched. brainstorm handles efficient execution of routing actions. this design allows brainstorm to collect profiles of fine-grained dataflow at the correct granularity. the traceability further opens up a new space of dynamic optimization for dynamic nns to specialize their execution to the runtime dynamism distribution. extensive evaluations show brainstorm brings up to 11.7× speedup (3.29× on average) or leads to 42% less memory consumption for popular dynamic neural networks with the proposed dynamic optimizations.
dynamic sparsity, where the sparsity patterns are unknown until runtime, poses a significant challenge to deep learning. the state-of-the-art sparsity-aware deep learning solutions are restricted to pre-defined, static sparsity patterns due to significant overheads associated with preprocessing. efficient execution of dynamic sparse computation often faces the misalignment between the gpu-friendly tile configuration for efficient execution and the sparsity-aware tile shape that minimizes coverage wastes (non-zero values in tensor). in this paper, we propose pit, a deep-learning compiler for dynamic sparsity. pit proposes a novel tiling mechanism that leverages permutation invariant transformation (pit), a mathematically proven property, to transform multiple sparsely located micro-tiles into a gpu-efficient dense tile without changing the computation results, thus achieving both high gpu utilization and low coverage waste. given a model, pit first finds feasible pit rules for all its operators and generates efficient gpu kernels accordingly. at runtime, with the sread and swrite primitives, pit rules can be executed extremely fast to support dynamic sparsity in an online manner. extensive evaluation on diverse models shows that pit can accelerate dynamic sparsity computation by up to 5.9x over state-of-the-art compilers.
in this paper, we introduce positional skip-wise (pose) training for efficient adaptation of large language models~(llms) to extremely long context windows. pose decouples train length from target context window size by simulating long inputs using a fixed context window with manipulated position indices during training. concretely, we select several short chunks from a long input sequence, and introduce distinct skipping bias terms to modify the position indices of each chunk. these bias terms, along with the length of each chunk, are altered for each training example, allowing the model to adapt to all positions within the target context window without training on full length inputs. experiments show that, compared with fine-tuning on the full length, pose greatly reduces memory and time overhead with minimal impact on performance. leveraging this advantage, we have successfully extended the llama model to 128k tokens. furthermore, we empirically confirm that pose is compatible with all rope-based llms and various position interpolation strategies. notably, by decoupling fine-tuning length from target context window, pose can theoretically extend the context window infinitely, constrained only by memory usage for inference. with ongoing advancements for efficient inference, we believe pose holds great promise for scaling the context window even further.
in many real-world tasks, some parts of state features, called contexts, are independent of action signals, e.g., customer demand in inventory control, speed of lead car in autonomous driving, etc. one of the challenges of reinforcement learning in these applications is that the true context transitions can be easily exposed some unknown source of contamination, leading to a shift of context transitions between source domains and target domains, which could cause performance degradation for rl algorithms. however, existing methods on robust rl aim at learning robust policies against the deviations of the entire system dynamics. to tackle this problem, this paper proposes the framework of robust situational markov decision process (rs-mdp) which captures the possible deviations of context transitions explicitly. to scale to large context space, we introduce the softmin smoothed robust bellman operator to learn the robust q-value approximately, and apply our rs-mdp framework to existing rl algorithm sac to learn the desired robust policies. we conduct experiments on several robot control tasks with dynamic contexts and inventory control tasks to demonstrate that our algorithm can generalize better and more robust against deviations of context transitions, and outperform existing robust rl algorithms.
we present kosmos-2.5, a multimodal literate model for machine reading of text-intensive images. pre-trained on large-scale text-intensive images, kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. this unified multimodal literate capability is achieved through a shared transformer architecture, task-specific prompts, and flexible text representations. we evaluate kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images. this work also paves the way for the future scaling of multimodal large language models.
large language models (llms) have been applied in various applications due to their astonishing capabilities. with advancements in technologies such as chain-of-thought (cot) prompting and in-context learning (icl), the prompts fed to llms are becoming increasingly lengthy, even exceeding tens of thousands of tokens. to accelerate model inference and reduce cost, this paper presents llmlingua, a coarse-to-fine prompt compression method that involves a budget controller to maintain semantic integrity under high compression ratios, a token-level iterative compression algorithm to better model the interdependence between compressed contents, and an instruction tuning based method for distribution alignment between language models. we conduct experiments and analysis over four datasets from different scenarios, i.e., gsm8k, bbh, sharegpt, and arxiv-march23; showing that the proposed approach yields state-of-the-art performance and allows for up to 20x compression with little performance loss.
在北京师范大学心理学部骆方教授的大力支持与协助下,微软亚洲研究院举办了“社会责任人工智能(societal ai)”系列研讨会的心理与教育专题讨论。研讨会上,来自心理测量领域、教育领域以及计算机领域的顶尖专家们共同探讨了心理测量技术应用与人工智能测评的可行性、大模型如何赋能心理测评,并展望了人工智能辅助下的未来教育。
ai 药物研发是人工智能未来应用的重要方向之一。自新冠病毒(sars-cov-2)首次爆发以来,新冠病毒的小分子药物研发备受关注,于近期举行的首届 ai 药物研发算法大赛便聚焦于此。在比赛中,来自微软研究院科学智能中心的团队,凭借创新的 ai 模型系统 ai2bmd 和 visnet 取得了绝佳的成绩,斩获桂冠。
distributional graphormer:从分子结构预测到平衡分布预测
微软研究院发布了可用于预测分子结构平衡分布的深度学习框架 distributional graphormer (dig),可以快速生成真实多样的构象,进而为实现从单一结构预测到平衡分布预测的突破奠定基础。
气候变化、流行病、发展鸿沟…… 应对这些挑战我们还要做些什么?
为了更好地分析电池性能,预测电池使用寿命,微软亚洲研究院开发并开源了一站式机器学习工具 batteryml,希望可以集结更多的专业力量,共同推动电池领域的研究。
微软亚洲研究院在 github 开源了一个能够灵活适应多智能体强化学习各种挑战的学习测试平台——mabim,从而可以更好地测试 marl 算法,让其更容易迁移到真实的应用场景中。
经过两年多的深入探索,qlib 迎来了重大更新,在原有的 ai 量化金融框架基础上,又引入了基于强化学习和元学习的新范式以及订单执行优化和市场动态性建模的新场景,帮助相关从业者使用更先进和多样的人工智能技术来应对更复杂的金融挑战。