Kamis, 27 Oktober 2011

Read Word Document using Interop.Word

You have a Microsoft Word document (.doc) and want to read it in your C# program. With the Microsoft.Office.Interop.Word assembly, you can get the contents and formatting from the document, as we demonstrate in this tutorial.


First, we show the file we will read: it contains three paragraphs containing one word each. The program first instantiates an Application instance and then we call Documents.Open on that variable. Next, we loop through the Words collection and read the Text property on each element. We then display the text content. Finally, we invoke Quit on the Application instance.
Word document [word.doc]

One

Two

three

Program that uses Microsoft Word interop [C#]

using System;
using Microsoft.Office.Interop.Word;

class Program
{
    static void Main()
    {
 // Open a doc file.
 Application application = new Application();
 Document document = application.Documents.Open("C:\\word.doc");

 // Loop through all words in the document.
 int count = document.Words.Count;
 for (int i = 1; i <= count; i++)
 {
     // Write the word.
     string text = document.Words[i].Text;
     Console.WriteLine("Word {0} = {1}", i, text);
 }
 // Close word.
 application.Quit();
    }
}

Output

Word 1 = One
Word 2 =
Word 3 = Two
Word 4 =
Word 5 = three
Word 6 =
Empty paragraphs. As you can see, Word 2, Word 4, and Word 6 are empty. The empty paragraphs in the input file are considered words. If you have multiple words in a paragraph, they will each be separate in the Words collection. This means a paragraph is made up of a collection of one or more words.
Quit. Why is the application.Quit statement important? If you don't include this, the WINWORD.EXE application will remain in the process list of the computer. Then, when this program is run again, a new one will be started parallel to it. This will waste memory and eventually cause resource usage problems.

Summary

In this tutorial, we took a look at the Microsoft.Office.Interop.Word assembly and learned how to read in data from a Word document. This can be very useful when you have DOC or DOCX files and want to programmatically read in data from your C# program.

Tidak ada komentar:

Posting Komentar