WebApp HTML Paser 의 종류

황제낙엽 2008.06.09 17:33 조회 수 : 917 추천:162

sitelink1
sitelink2
sitelink3
sitelink4
sitelink5
sitelink6

1. HTMLParser ( http://htmlparser.sourceforge.net ) 내용 보기

HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.
Welcome to the homepage of HTMLParser - a super-fast real-time parser for real-world HTML. What has attracted most developers to HTMLParser has been its simplicity in design, speed and ability to handle streaming real-world html.

2. jericho HTML Parser ( http://jerichohtml.sourceforge.net/doc/index.html ) 내용 보기

Jericho HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.

3. NekoHTML ( http://people.apache.org/~andyc/neko/doc/index.html ) 내용 보기

NekoXNI is a collection of small, useful XML tools written for the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. The NekoXNI tools are written to illustrate the power and flexibility of the XNI framework as well as provide useful tools for XML application developers.

4. JTidy ( http://jtidy.sourceforge.net ) 내용 보기

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

5. YGHTML Parser ( http://jakarta.tistory.com/25 ) 내용 보기

주말 이틀 반납해가며, SCRIPT 부분에 대해 Parsing이 되는 HTML Parser를 만들어 보았다.
Version은 0.1 정도로 보면 되겠다. 가장 문법이 변태적인 Naver정도는 정확하게 Parsing을 해냈으나 아직까지 많은 테스트와 수정, 추가가 필요한 상황이다. 연구용 정도로 쓰기는에는 불편함이 없을듯 하며, 아직 DOM Tree를 생성하는 부분의 구현은 안되었다.(시간 부족)
Lexer가 어느정도 안정화 되었으므로 Stack 정도만 적절히 쓴다면 DOM Tree 정도는 쉽게 구현이 가능할 것이다.

이 게시물을

번호	제목	글쓴이	날짜	조회 수
143	숫자 에 대응 되는 패턴의 형식화 #1	황제낙엽	2008.07.08	718
142	숫자를 통화 표기 형태로 변환하기	황제낙엽	2008.07.08	683
141	NumberFormat, DecimalFormat 사용예	황제낙엽	2008.07.08	674
140	파일의 내용을 읽어 String 객체로 만드는 함수	황제낙엽	2008.06.17	526
139	UTF형태 파일에서 BOM 제거하기	황제낙엽	2008.06.16	2525
138	불러온 txt파일의 Encoding을 알 수는 방법좀 가르쳐 주세요~	황제낙엽	2008.06.16	622
137	FileFilter, FilenameFilter 클래스를 이용한 파일 또는 디렉토리 리스트 추출하기	황제낙엽	2008.06.16	760
136	정규식 사용예제 [2]	황제낙엽	2008.06.11	673
135	정규식 사용예제 [1]	황제낙엽	2008.06.11	720
134	StringBuffer vs String	황제낙엽	2008.06.10	475
133	작지만 강력한 HTML 파서, HtmlCleaner, html parser	황제낙엽	2008.06.10	675
132	Jericho HTML Parser	황제낙엽	2008.06.10	819
131	JTidy(HTML Parser) How to	황제낙엽	2008.06.10	754
130	NekoHTML 샘플 예제	황제낙엽	2008.06.09	595
129	YGHTML Parser 0.1.1 샘플 예제	황제낙엽	2008.06.09	620
»	HTML Paser 의 종류	황제낙엽	2008.06.09	917
127	File 생성시 encoding 지정하기 (Unicode/utf-8 file 읽고 쓰기)	황제낙엽	2008.05.22	998
126	java String.replaceAll (String regex, String replacement) 쓸떄 조심할 것	황제낙엽	2008.05.22	711
125	java String.replaceAll 잘쓰기	황제낙엽	2008.05.22	744
124	간단한 DBConnection 프로그램 (JDBC)	황제낙엽	2008.05.15	784

쓰기 태그

첫 페이지 7 8 9 10 11 12 13 14 15 16 끝 페이지

WebApp HTML Paser 의 종류

댓글 0

로그인