所以我们有一些非常低效的代码,根据允许的最大大小将pdf分成更小的块.阿卡.如果最大大小为10megs,则跳过8兆字节文件,而根据页数分割16兆字节文件.
这是我继承的代码,并且感觉必须有一种更有效的方法来执行此操作,只需要一个方法和较少的对象实例化.
List<int> splitPoints = null; List<byte[]> documents = null; splitPoints = this.GetPDFSplitPoints(currentDocument,maxSize); documents = this.SplitPDF(currentDocument,maxSize,splitPoints);
方法:
private List<int> GetPDFSplitPoints(IClaimDocument currentDocument,int maxSize) { List<int> splitPoints = new List<int>(); PdfReader reader = null; Document document = null; int pagesRemaining = currentDocument.Pages; while (pagesRemaining > 0) { reader = new PdfReader(currentDocument.Data); document = new Document(reader.GetPageSizeWithRotation(1)); using (MemoryStream ms = new MemoryStream()) { Pdfcopy copy = new Pdfcopy(document,ms); PdfImportedPage page = null; document.open(); //Add pages until we run out from the original for (int i = 0; i < currentDocument.Pages; i++) { int currentPage = currentDocument.Pages - (pagesRemaining - 1); if (pagesRemaining == 0) { //The whole document has bee traversed break; } page = copy.GetImportedPage(reader,currentPage); copy.AddPage(page); //If the current collection of pages exceeds the maximum size,we save off the index and start again if (copy.CurrentDocumentSize > maxSize) { if (i == 0) { //One page is greater than the maximum size throw new Exception("one page is greater than the maximum size and cannot be processed"); } //We have gone one page too far,save this split index splitPoints.Add(currentDocument.Pages - (pagesRemaining - 1)); break; } else { pagesRemaining--; } } page = null; document.Close(); document.dispose(); copy.Close(); copy.dispose(); copy = null; } } if (reader != null) { reader.Close(); reader = null; } document = null; return splitPoints; } private List<byte[]> SplitPDF(IClaimDocument currentDocument,int maxSize,List<int> splitPoints) { var documents = new List<byte[]>(); PdfReader reader = null; Document document = null; MemoryStream fs = null; int pagesRemaining = currentDocument.Pages; while (pagesRemaining > 0) { reader = new PdfReader(currentDocument.Data); document = new Document(reader.GetPageSizeWithRotation(1)); fs = new MemoryStream(); Pdfcopy copy = new Pdfcopy(document,fs); PdfImportedPage page = null; document.open(); //Add pages until we run out from the original for (int i = 0; i <= currentDocument.Pages; i++) { int currentPage = currentDocument.Pages - (pagesRemaining - 1); if (pagesRemaining == 0) { //We have traversed all pages //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document fs.Flush(); copy.Close(); documents.Add(fs.ToArray()); document.Close(); fs.dispose(); break; } page = copy.GetImportedPage(reader,currentPage); copy.AddPage(page); pagesRemaining--; if (splitPoints.Contains(currentPage + 1) == true) { //Need to start a new document //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document fs.Flush(); copy.Close(); documents.Add(fs.ToArray()); document.Close(); fs.dispose(); break; } } copy = null; page = null; fs.dispose(); } if (reader != null) { reader.Close(); reader = null; } if (document != null) { document.Close(); document.dispose(); document = null; } if (fs != null) { fs.Close(); fs.dispose(); fs = null; } return documents; }
据我所知,我能看到的唯一在线代码是VB,并不一定解决大小问题.
更新:
我们遇到了OutofMemory异常,我认为这是大对象堆的问题.因此,一个想法是减少代码占用空间,这可能会减少堆上的大对象数量.
基本上,这是循环的一部分,遍历任意数量的PDF,然后拆分它们并将它们存储在数据库中.现在,我们不得不改变方法,一次完成所有这些(最后一次运行是各种大小的97 pdf),每5分钟通过系统运行5个pdf.当我们将工具增加到更多客户端时,这并不理想,并且不能很好地扩展.
(我们正在处理50 -100兆字节的pdf,但它们可能更大).
解决方法
我也继承了这个确切的代码,似乎存在一个重大缺陷.在GetPDFSplitPoints方法中,它根据maxsize检查复制页面的总大小,以确定在哪个页面拆分文件. 在SplitPDF方法中,当它到达发生拆分的页面时,确定该点上的MemoryStream低于允许的最大大小,并且还有一页将使其超出限制.但是在document.Close()之后;执行后,还有更多内容被添加到MemoryStream中(在我使用过的PDF文件中,MemoryStream的长度从文件前后的9 MB变为19 MB.关闭).我的理解是,复制页面的所有必要资源都会在关闭时添加. 我猜我必须完全重写这段代码,以确保我不会超过最大尺寸,同时保持原始页面的完整性.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。